Spark Interview Question | How many CPU Cores | How many executors | How much executor memory

preview_player
Показать описание
Learn Data Engineering using Spark and Databricks. Prepare for cracking Job interviews and perform extremely well in your current job/projects. Beginner to advanced level training on multiple technologies.
Fill up the inquiry form, and we will get back to you with a detailed curriculum and course information.

========================================================

SPARK COURSES
-----------------------------------------------

KAFKA COURSES
--------------------------------

AWS CLOUD
------------------------

PYTHON
------------------

========================================
We are also available on the Udemy Platform
Check out the below link for our Courses on Udemy

=======================================
You can also find us on Oreilly Learning.

==============================
Follow us on Social Media.

========================================
Рекомендации по теме
Комментарии
Автор

With this formula,
Memory of Executor = 2.5GB~3GB always

For X size of data,
No. of Cores = 8*X
No. of Executors = 8/5*X

vinayak
Автор

Very well explained. Thank you so much.

ETLMasters
Автор

Great .but follow up question for this by interviwever is s how do we take 4x memory per executor.

sonurohini
Автор

If recommended memory per executor is 3gb . for 10 gb file executors we need is only 4. how come 16 are there according to calculation . please kindly answer.

ramu
Автор

What if the cluster size is fixed? Also, shouldn't we take into account per node constraint? For eg: what if the no. of cores in a node is 4?

tridipdas
Автор

For the same 10 GB file suppose we have following resources:
38 GB worker memory with10 cores, 8gb driver memory with 2 cores, manually configured schuffle partitions - 80.

How will it behave?

arnabghosh
Автор

Hi, Is 4x a kind of standard ?. Please confirm

navasampath
Автор

Can you please explain why 4x memory required for each core

vikaschavan
Автор

16 executors. Each with 5 cores and 3 gb ram.

In each executor how much data can be cached.
How much data can be processed.
What about shuffling.
For narrow and wide transformations.
Any out of memory issues.

Do you really think total 80 cores and 3*16 = 48gb ram required to process 10gb data.

Please give complete answer sir.

marreddyp
Автор

can u explain about the seralization with example in spark that is used with profer results

sangu
Автор

Hi, what if the file is in different storage location and the cluster manager is different from YARN ? how to calculate.

cherukurid
Автор

Datanode = 10
16 CPUs / node
64 GB Memory / node Please tell me cluster config we are going to choose ?

Sauravsuman
Автор

Sir what if if we are reading 100GB file in that case number of executor will be 160 . Do you think 160 executor will be correct one here

shivamdwivedi
Автор

Hi - amount of memory

In this case 3gb always for all size of data ? I think we have to tweak as per the size of data

vinothvk
Автор

in last question each and every value you took was default only (128mb, 4, 512mb, 5 cores), so lets say the question is for 50 gb of data then still 3gb would be the answer?

vaibhavtyagi
Автор

If no. of cores are 5 per executor,
At shuffle time, by default it creates 200 partitions, how that 200 partitions will be created, if no of cores are less, because 1 partition will be stored on 1 core.

Suppose, that
My config is, 2 executor each with 5 core.
Now, how it will create 200 partitions if I do a group by operation?
There are 10 cores, and 200 partitions are required to store them, right?
How is that possible?

Amarjeet-fblk
Автор

i have applay 4x memory in each core for 5Gb file but no luck can you please help me to how to resolve this issue


Road map:
1)Find the number of partition -->5GB(10240mb)/128mb=40
2)find the CPU cores for maximum parallelism -->40 cores for partition
3)find the maximum allowed CPU cores for each executor -->5 cores per executor for Yarn
4)number of executors=total cores/executor cores -> 40/5=8 executors

Amount of memory is required

Road map:
1)Find the partition size -> by default size is 128mb
2)assign a minimum of 4x memory for each core -> what is applay
3)multiple it by executor cores to get executor memory ->????

ultimo
Автор

How did you assume that each core will require 4x the partition size ?

vipuljohri
Автор

sir, is there any way to get Databricks certifications vouchers?

sandippatel
Автор

Hello sir how to process 100 gb data . How can we calculate memory and executor and driver pleas help me .

raviyadav-dttb