95% reduction in Apache Spark processing time with correct usage of repartition() function

Показать описание

Hello Friends,

In this video I have demonstrated how we can reduce the processing time by more than 95% with correct usage of repartition() function in Apache Spark.

If we repartition() the data before running join or aggregation queries then it reduced the amount of data shuffle read / write and as such processing happens very fast.

Also by increasing the number of partitions, we make the aggregation tasks more manageable for the processor and thereby reduce the processing time.

Thanks.

Rajesh Jakhotia
Spark
Hadoop
Big Data
Machine LEarning
Artificial Intelligence

Рекомендации по теме

Комментарии

Here in this code it is required to set you please explain when we should you shuffle partitions as am bit confused between repartition and shuffle partitions.thankyou in advance

narikinabiillijyotsna

How to determine the number of partitions to use

Basket-hbjc

95% reduction in Apache Spark processing time with correct usage of repartition() function

95% reduction in Apache Spark processing time with correct usage of repartition() function

Repartition internals in Apache Spark SQL

Spark Basics | Partitions

Improving Apache Spark Application Processing Time by Configurations, Code Optimizations, etc.

275 million records of Stock Market Data processed in less than 10 Seconds on 3 Node Spark Cluster

spark out of memory exception

Spark Basics | Shuffling

Repartition and Coalesce | Spark Interview

Essential Spark configuration

Lecture -11 | Spark group by key | reduce by key | practical example

Boosting Query Performance with Spark Catalyst Optimizer | Interview Q&A

How to Gain Up to 9X Speed on Apache Spark Jobs

Spark memory allocation and reading large files| Spark Interview Questions

Apache Spark Internals: Task Scheduling - Execution of a Physical Plan

Hadoop Map Reduce Vs. Apache Spark & Scala

Optimize read from Relational Databases using Spark

Efficient Distributed Hyperparameter Tuning with Apache Spark

How Salting Can Reduce Data Skew By 99%

Accelerating Apache Spark Shuffle for Data Analytics on the Cloud w/ Remote Persistent Memory Pools

Optimization Techniques in Apache Spark | Apache Spark Interview Questions | Data Katral

60 - Spark RDD - Repartition and Coalesce

Part 4: PySpark Transformations - Repartition and Coalesce

Apache Spark Optimization Techniques, Performance Tuning | Pepperdata

When you switch your petrol scooter with an electric one 😂