An Adaptive Execution Engine For Apache Spark SQL - Carson Wang

Показать описание

"Catalyst is an excellent optimizer in SparkSQL, provides open interface for rule-based optimization in planning stage. However, the static (rule-based) optimization will not consider any data distribution at runtime. A technology called Adaptive Execution has been introduced since Spark 2.0 and aims to cover this part, but still pending in early stage. We enhanced the existing Adaptive Execution feature, and focus on the execution plan adjustment at runtime according to different staged intermediate outputs, like set partition numbers for joins and aggregations, avoid unnecessary data shuffling and disk IO, handle data skew cases, and even optimize the join order like CBO etc.. In our benchmark comparison experiments, this feature save huge manual efforts in tuning the parameters like the shuffled partition number, which is error-prone and misleading. In this talk, we will expose the new adaptive execution framework, task scheduling, failover retry mechanism, runtime plan switching etc. At last, we will also share our experience of benchmark 100 -300 TB scale of TPCx-BB in a hundreds of bare metal Spark cluster.

Session hashtag: EUdev4"

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:

Рекомендации по теме

Комментарии

Great contribution, thank you! This works well together with Dynamic Resource Allocation in Spark 2.4.4. For smaller files, when default parallelism is used, 200 tasks are created in a join operation and DRA maxes out. This results in low efficiency, since each executors processes just a tiny partition. Enabling Adaptive Execution keeps resource usage much more reasonable. Also, the output gets written to fewer and larger files, which is good for HDFS.

DzmitryMakatun

What's the way to activate shuffle partition automatically in Spark 2.2?

ebertoburgos

An Adaptive Execution Engine For Apache Spark SQL - Carson Wang

An Adaptive Execution Engine For Apache Spark SQL - Carson Wang

Adaptive Query Execution in Spark

Advancing Spark - Crazy Performance with Spark 3 Adaptive Query Execution

AQE in spark | Lec-19

Adaptive Query Execution (AQE) in Spark | Spark Interview Questions

26. Databricks | Spark | Adaptive Query Execution| Interview Question | Performance Tuning

Demo: Spark 3 Adaptive Query Execution

Spark SQL Adaptive Execution Unleashes The Power of Cluster (Carson Wang and Yuanjian Li)

A Deep Dive into Query Execution Engine of Spark SQL - Maryann Xue

Basics of Apache Spark | Spark Optimization | Adaptive Query Execution | Repartition | learntospark

Photon for Dummies: How Does this New Execution Engine Actually Work?

Delta Engine: High Performance Query Engine for Delta Lake | Keynote Spark + AI Summit 2020

Tech Chat: Faster Spark SQL: Adaptive Query Execution in Databricks

Adaptive Query Execution: Speeding Up Spark SQL at Runtime

Query Processing and Execution in Sybase ASE | SAP ASE Tutorial

How to Build a Linux Executable from an AUTOSAR Adaptive Model in Simulink

Spark SQL Adaptive Execution Unleashes the Power of Cluster in Large Scale with Chenzhao Guo

Adaptive Execution - Ford

How To: Set Up and Configure Spark Execution Engine

Apache Spark 3.0 and reuse subquery optimization in the Adaptive Query Execution

P66 | Adaptive Execution of Compiled Queries

Joe Sack (@JoeSackMSFT) - Adaptive Query Processing in SQL Server 2017 Engine

25 AQE aka Adaptive Query Execution in Spark | Coalesce Shuffle Partitions | Skew Partitions Fix

A Deep Dive into Query Execution Engine of Spark SQL continues -Maryann Xue