Seattle Spark + AI Meetup: How Apache Spark™ 3.0 and Delta Lake Enhance Data Lake Reliability

Показать описание

Apache Spark™ has become the de-facto open-source standard for big data processing for its ease of use and performance. The open-source Delta Lake project improves Spark’s data reliability, with new capabilities like ACID transactions, Schema Enforcement, and Time Travel.

Join us in this meetup to learn more about the performance improvements in Apache Spark 3.0 including Adaptive Query Execution (AQE), Dynamic Partition Pruning (DPP), and handling skewed queries!

Topics to be covered including:

* The new Adaptive Query Execution (AQE) framework within Spark 3.0 can yield query performance gains. Based on a 3TB TPC-DS benchmark, two queries had more than a 1.5x speedup, and another 37 queries had more than 1.1x speedup.
* With Dynamic Partition Pruning (DPP), we can significantly speed up performance by pruning partitions based on the joins between the fact and dimension tables common in star schema design.

Databricks

Рекомендации по теме

Комментарии

Thank you sharing these improvements in Spark 3!

datrumpet

Does this mean from Spark 3.0 with AQE turned on, there is no need to manually calculate statistics with the "ANALYZE TABLE ..." idiom?

nilanjansarkar

Seattle Spark + AI Meetup: How Apache Spark™ 3.0 and Delta Lake Enhance Data Lake Reliability

Why Data is Eating the Universe: Hosted by Seattle Spark+AI Meetup

Seattle Spark + AI Meetup: How Apache Spark™ 3.0 and Delta Lake Enhance Data Lake Reliability

AI Meetup (Seattle 0921) at GitHub - 3

[Uber Seattle] Horovod: Distributed Deep Learning on Spark

Chris Fregly - Istanbul Spark Meetup - Spark After Dark - Part 1/4

Advanced DC Spark Meetup 02-22-2016 - Chris Fregly - Recommendations with Spark

Last Week in a Byte (2023-03-28)

Spark +AI Summit 2020 NA - Friday Morning Keynotes

ML Meetup (Seattle): FLAML: A Fast Library for AutoML & Tuning

Efficient Data Parallel Distributed Training with Flyte, Spark & Horovod

Advanced Apache Spark Meetup 9-3-2015

Advanced Apache Spark and TensorFlow Workshop - Chris Fregly - Seattle - July 30 2016 - Part 1 of 6

Challenging Web-Scale Graph Analytics with Apache Spark

Advanced Apache Spark Meetup 10-07-2015 Chris Fregly - Spark Beats Hadoop Sorting Challenge

Data Collab Lab | Scaling and Automating R with Databricks

PyData Tel Aviv Meetup: Psychological Forest: Predicting Human Behavior - Ori Plonsky

Machine Learning Lessons Learned from the Field: Interview with Brooke Wenig

Getting started with Spark on Databricks

Advanced Spark and TensorFlow Meetup - O'Reilly AI Conference - London - Oct 2018

Advanced Spark and TensorFlow Meetup - Sam Abrahams - May 26 2016 - Part 2 of 3

Integrating AI workflows into your project: Introducing Xef.ai

Spark Meetup at Strata

Advanced Spark and TensorFlow Meetup - Chris Fregly - May 26 2016 - Part 1 of 3

Meetup: Simplifying Machine Learning on Big Data with Apache Kylin