Advancing Spark - Understanding Low Shuffle Merge

Показать описание

Back in Databricks Runtime 9.0 we saw the introduction of a preview "Low Shuffle Merge" feature, but it seemed to go fairly unnoticed. In DBR 10.4, it's now enabled by default and a fully GA part of the platform... but what does it actually do?

In this video, Simon walks through the theory of low shuffle merge, and what you should expect to see happening to both your runtime executions, but also the data layout before and after the change. Make no mistake, it's a real speed boost to many common patterns, so use it if you can!

And as always, get in touch with Advancing Analytics if you need help on your Lakehouse journey

Advancing Analytics

Рекомендации по теме

Комментарии

Really useful.
New to Databricks and you're last couple of videos have really helped me understand how it will support some of the key concepts needed and some of the gotchas that are actually being resolved in the new releases. Thank you.

WastedFury

Apologies - looks like I wiped out comments when clearing some initial spam. Apologies if anyone's actual comments got dropped!
Simon

AdvancingAnalytics

I've been using it since the day 1. It has improved my merges :D

YoussefMrini

Kudos for the whiteboard. You should do it more often

fb-guer

Thanks for explanation. I am working on one such scenario where table (no efficient column for partition in table, not able to use predicate pushdown in merge )has 2 bn rows and my batch job run every 1 hour for loading(1mn rows every hour). Now merge is taking more time upwards of 50mins. I will try to implement low shuffle merge and also optimize z order by (once daily). Can you suggest any other optimization techniques?

Vikasptl

Thanks for the explanation. What device are you using for the whiteboarding part?

ArcaLuiNeo

very time consuming explanation method.

tarun

Advancing Spark - Understanding Low Shuffle Merge

Advancing Spark - Understanding Low Shuffle Merge

Advancing Spark - Low-Code Pandas with Databricks Bamboolib

Advancing Spark - Understanding the Spark UI

Tech Tip - What is Spark Advance

Advancing Spark - Understanding Terraform

IGNITION TIMING SIMPLIFIED | The secrets of spark tuning revealed

Apache Spark in 60 Seconds

Advancing Spark - DLT Updates & Enhanced Autoscaling

SPARC Additive Manufacturing workshop - Advanced Ceramics for Sustainability - IIT Madras

Advancing Spark - Autoloader Resource Management

Advancing Spark - Engineering behind Featurestore

Advancing Spark - Self-Paced Spark Training Now Available!

Advancing Spark - How to pass the Spark 3.0 accreditation!

5 Ways to Ruin Spark Advance

Advancing Spark - Databricks Delta Change Feed

Advancing Spark - Data + AI Summit 2022 Day 1 Recap

Advancing Spark - Getting Started with Ganglia in Databricks

Advancing Spark - Data Lakehouse Star Schemas with Dynamic Partition Pruning!

Advancing Spark - Dynamic Data Decryption

Setting Ignition Timing Video - Advance Auto Parts

Apache Spark Core—Deep Dive—Proper Optimization Daniel Tomes Databricks

Advancing Spark - Configuring Azure Databricks Spot VM Clusters

Learn Apache Spark in 10 Minutes | Step by Step Guide

Spark Timing & Dwell Control Training Module Trailer