Optimizing Apache Spark™ on Databricks
- Offered byDatabricks
Optimizing Apache Spark™ on Databricks at Databricks Overview
Duration | 12 hours |
Start from | Start Now |
Total fee | ₹1.26 Lakh |
Mode of learning | Online |
Official Website | Go to Website |
Credential | Certificate |
Optimizing Apache Spark™ on Databricks at Databricks Highlights
- Earn a certificate from Databricks
- Learn from industry experts
Optimizing Apache Spark™ on Databricks at Databricks Course details
In this course, you will explore the five key problems that represent the vast majority of performance issues in an Apache Spark application: skew, spill, shuffle, storage, and serialization
With examples based on 100 GB to 1+ TB datasets, you will investigate and diagnose sources of bottlenecks with the Spark UI and learn effective mitigation strategies
You will also discover new features introduced in Spark 3 that can automatically address common performance problems
Lastly, you learn how to design and configure clusters for optimal performance based on specific team needs and concerns
Optimizing Apache Spark™ on Databricks at Databricks Curriculum
Day 1
Review of Spark architecture and Spark UI
Skew
Spill
Shuffle
Storage
Serialization
Day 2
Ingestion basics
Predicate push downs
Disk partitioning
Z-ordering
Bucketing
Optimization with Adaptive Query Execution (AQE)
Designing and configuring clusters for high-performance