Adaptive Query Execution: Speeding Up Spark SQL at Runtime

preview_player
Показать описание
Over the years, there has been extensive and continuous effort on improving Spark SQL’s query optimizer and planner, in order to generate high quality query execution plans. One of the biggest improvements is the cost-based optimization framework that collects and leverages a variety of data statistics (e.g., row count, number of distinct values, NULL values, max/min values, etc.) to help Spark make better decisions in picking the most optimal query plan. Examples of these cost-based optimizations include choosing the right join type (broadcast-hash-join vs. sort-merge-join), selecting the correct build side in a hash-join, or adjusting the join order in a multi-way join. However, chances are data statistics can be out of date and cardinality estimates can be inaccurate, which may lead to a less optimal query plan. Adaptive Query Execution, new in Spark 3.0, now looks to tackle such issues by re-optimizing and adjusting query plans based on runtime statistics collected in the process of query execution. This talk is going to introduce the adaptive query execution framework along with a few optimizations it employs to address some major performance challenges the industry faces when using Spark SQL. We will illustrate how these statistics-guided optimizations work to accelerate execution through query examples. Finally, we will share the significant performance improvement we have seen on the TPC-DS benchmark with Adaptive Query Execution.

About:
Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.
Connect with us:
Рекомендации по теме
Комментарии
Автор

really interesting improvisation to partitioning :-)

rphan
Автор

Thanks for the effort, great contents!

Ragdoll_Meow
Автор

Do you have the notebook in public where we can check the features on our own clusters? Thank you

sid
Автор

Awesome speech, thanks for uploading, where could I get these slides ?

unikrau
Автор

Can anyone provide me a scientific source about AQE which I can use for my masters thesis? Unfortunately all I can find are blog articles or this youtube Video. A paper would be great.

thomashass