Spark Memory Management | How to calculate the cluster Memory in Spark

preview_player
Показать описание
Hi Friends,
In this video, I have explained how to calculate the Spark cluster memory.

Please subscribe to my channel for more interesting learnings.
Рекомендации по теме
Комментарии
Автор

Much needed for interviews, Thanks for sharing sravana.

venukumargadiparthy
Автор

a small correction i think --num-executors is across all nodes in cluster so it should not be 3 it should be 3 * 5 = 15

RaXiUs
Автор

A great explanation! Thank you so much.

hamidkureshi
Автор

Thank you so much for this video. Very well explained. Can you please make more videos related to interview questions on this topic.

heenagirdher
Автор

say if we want to process 1 tb data with a given cluster capacity in your example
1. when we may get OOM (executory) issue
2. when we will not get OOM issue
3. how spark can do sort merge shuffle join (500gb per df, 2 dfs)
4. briefly explain, how come spark handles big data without OOM issues and when it may get OOM with examples along with code

universaltv
Автор

—num executor should be 15 right ? We need to give number of executors to be used in total and not per node. Pls correct me if I am wrong.

leedsshri
Автор

can you please name those tools which handle the memory optimization?

udaynayak
Автор

Even if we have 1GB input data shall I consider same parameters

HollyJollyTolly
Автор

In Spark 3.0 onwards, these calculations are done by default by spark right?

manjunathbn
Автор

Kindly can u say different between application master and driver?
Y r we giving less memory(2gb) to driver?

antonyjesu
Автор

Sorry to ask..again....
suppose the question is raw process data is 10gb.
Based on raw data 10 gb how to calculate memory?
Plz

antonyjesu
Автор

if we request a container of 4gb then we are actually requesting 4gb(heap memory) + max(384, 10% of 4gb)[off heap memory]
out off the 4gb (total heap memory)
300mb reserved for running executers.
4096-300=3796(3.7gb)
out of this 3.7 gb, 60% of it goes to unified( storage+ execution memory ).
2.3 gb is for (storage+execution)
remaining 40 % of 3.7gb goes to user memory(i.e. 1.4 gb)
I am not able relate with your calculation kindly help me. I have checked multiple video but still not able to understand. how do I calculate cluster memory. kindly help

jywvntd