From Query Plan to Performance: Supercharging your Apache Spark Queries using the Spark UI SQL Tab

preview_player
Показать описание
The SQL tab in the Spark UI provides a lot of information for analysing your spark queries, ranging from the query plan, to all associated statistics. However, many new Spark practitioners get overwhelmed by the information presented, and have trouble using it to their benefit. In this talk we want to give a gentle introduction to how to read this SQL tab. We will first go over all the common spark operations, such as scans, projects, filter, aggregations and joins; and how they relate to the Spark code written. In the second part of the talk we will show how to read the associated statistics to pinpoint performance bottlenecks.

After attending this session you will have a better grasp on query plans and the SQL tab, and will be able to use this knowledge to increase the performance of your spark queries.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:
Рекомендации по теме
Комментарии
Автор

is there a repository to go over the real time bad vs good written spark sql ?

LearnShare
Автор

Why does HashMergeJoin not mentioned in the presentation?

aviyehuda
Автор

Why does a spark query is translated to multiple spark jobs?

aviyehuda