2023-02-12 Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks
Videos
2023-02-12 Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks - YouTube
133,082 views May 6, 2019
Optimizing spark jobs through a true understanding of spark core. Learn: What is a partition? What is the difference between read/shuffle/write partitions? How to increase parallelism and decrease output files? Where does shuffle data go between stages? What is the "right" size for your spark partitions and files? Why does a job slow down with only a few tasks left and never finish? Why doesn't adding nodes decrease my compute time? About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business. Read more here: https://databricks.com/product/unifie... Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/ Databricks is proud to announce that Gartner has named us a Leader in both the 2021 Magic Quadrant for Cloud Database Management Systems and the 2021 Magic Quadrant for Data Science and Machine Learning Platforms.
2023-02-12 Spark SQL Shuffle Partitions - Spark By {Examples}
spark.sql.shuffle.partitions
Slides
2023-02-12 🎥 Apache Spark Core – Practical Optimization Daniel Tomes Databricks - YouTube
This is talk with similar content, but a bit different slides. Many thanks to Daniel Tomes for such amazing content.