Optimizing Apache Spark on Databricks

Skip to Scheduled Dates

Course Overview

Apache Spark is powerful, but only when optimized. Most Spark performance issues boil down to a handful of root causes: shuffle, skew, spill, serialization, and storage. In this two-day course, you’ll learn to diagnose and resolve these using the Spark UI, targeted optimization techniques, and tools in Spark 3.0.

This course also explores how to optimize your query execution, manage shuffle partition issues, and structure data using Delta Lake, partition strategies, and data skipping. You’ll apply hands-on skills to improve the performance of real-world workloads and design better clusters. Whether you’re tuning for SQL queries or preparing for large-scale machine learning pipelines, this course will help you get the most out of Spark and Databricks.

Course Objectives

    By the end of this course, you’ll be able to identify and fix common Spark performance bottlenecks. You’ll also understand how to apply Spark 3.x features and cluster design strategies to improve efficiency.

    • Diagnose skew, spill, shuffle, storage, and serialization issues
    • Use the Spark UI to investigate performance bottlenecks
    • Apply performance tuning techniques during data ingestion
    • Use features like Z-ordering, bucketing, and Adaptive Query Execution (AQE)
    • Configure a Databricks cluster for optimal Spark performance

Course Outline

Day 1: Understanding and Diagnosing Performance Issues

  • Spark architecture and Spark UI
  • Skew and data imbalance
  • Spill and memory issues
  • Shuffle mechanics
  • Storage formats and tuning
  • Serialization performance

Day 2: Optimizing and Scaling Spark Workloads

  • Data ingestion: partitioning, predicate pushdown
  • Z-ordering and bucketing strategies
  • Adaptive Query Execution (AQE)
  • Designing clusters for specific workloads
  • Hands-on optimization labs using Databricks

< Back to Course Search

Class Dates & Times

Class times are listed Eastern time

This is a 2-day class

Please contact Akhil for your custom class price.
at@tmbsusa.com | +1(908) 334-4476

Class dates not listed.
Please contact us for available dates and times.