Managing Spark Partitions | Spark Tutorial | Spark Interview Question

preview_player
Показать описание
#Apache #BigData #Spark #Partitions #Shuffle #Stage #Internals #Performance #optimisation #DeepDive #Join #Shuffle:

Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more

About us:
We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.

Visit us :
Twitter :

Thanks for watching
Please Subscribe!!! Like, share and comment!
Рекомендации по теме
Комментарии
Автор

Hats off and hands down dude you are genius

anmolchoudhary
Автор

Thanks a lot Viresh. I have not seen this level of explanation so far. This really helps. :-) Would you be able to share the ppt's or atleast the plain notes from the ppt's?

subramanyams
Автор

Why we have not used coalesce instead of repartition for 2000 records from 13000 to 4 partitions.

sushantsangle
Автор

Please explain with any dataset, all are explaining same with the numbers only

RangaSwamyleela
Автор

Does spark really create empty part files if there are empty partitions ? I tried to simulate the same in databricks platform and I observed that it has only written partitions which had some data. Empty partitions were not written and hence empty part files were not created. Could you please confirm this ?

SachinChavan
Автор

Thanks for explanation, one question i have, how i come to know the partition count of my data like u said 13k in your case, if have some orc files?

rahuljain
Автор

For the last example, using coalesce instead of repartition will cause less shuffling of data, is that correct? Thanks!

snehakavinkar
Автор

Why you wont go for colease for 13000 partitions??

abhishekkn