95% reduction in Apache Spark processing time with correct usage of repartition() function

preview_player
Показать описание
Hello Friends,

In this video I have demonstrated how we can reduce the processing time by more than 95% with correct usage of repartition() function in Apache Spark.

If we repartition() the data before running join or aggregation queries then it reduced the amount of data shuffle read / write and as such processing happens very fast.

Also by increasing the number of partitions, we make the aggregation tasks more manageable for the processor and thereby reduce the processing time.

Thanks.
Рекомендации по теме
Комментарии
Автор

Here in this code it is required to set you please explain when we should you shuffle partitions as am bit confused between repartition and shuffle partitions.thankyou in advance

narikinabiillijyotsna
Автор

How to determine the number of partitions to use

Basket-hbjc