Spark memory allocation and reading large files| Spark Interview Questions

Показать описание

Hi Friends,

In this video, I have explained the Spark memory allocation and how a 1 tb file will be processed by Spark.

Please subscribe to my channel for more interesting learnings.

Рекомендации по теме

Комментарии

I have seen several videos, You are the best. Appreciate your efforts.

San-hszx

Thank you mam! Concept is clear, I think that is how spark is doing efficienct pipelining

gowthamsagarkurapati

Good explanation.. I've one doubt. How did you calculate no of blocks for 1TB file?
In the video are you saying 84 Lakhs blocks if yes then how to calculate this number?

mohammedasif

Thanks for this video. i have one question If I have 500 GB data then in order to process it what will be my ideal cluster
configuration?

SunilPandey-u

I like the explanation. So, instead of 84 Lakh blocks, you suppose to say 8192 blocks. right?
20 Executor-machine
5 Core processor in each executor node (FYI: cores come in pairs: 2, 4, 6, 8, and so on)
6 GB RAM in each executor node
128 MB default block size

And the cluster can perform (20*5=100) Task parallel at a time. here tasks mean block so 100 blocks can be processed parallelly at a time.
100*128MB = 12800 MB / 1024GB = 12.5 GB (So, 12GB data will get processed in 1st set of a batch)

Since the RAM size is 6GB in each executor. (20 executor x 6GB RAM =120GB Total RAM) So, at a time 12GB of RAM will occupy in a cluster (20node/12gb=1.6GB RAM In each executor). Now, Available RAM in each executor will be (6GB - 1.6GB = 4.4GB) RAM which will be reserved for other users' jobs and programs.

So, 1TB = 1024 GB / 12GB = (Whole data will get processed in around 85 batches).

In the calculation I've used for understanding purposes, actual values may differ in comparison with real-time scenarios.
Please feel free to comment & correct me, if I'm doing anything wrong, thanks!

chrajeshdagur

Hi, A great explanation no doubt, can you pls tell me per machine how many executors will be there?

vaibhavverma

Wonderful 👌👌.. you've gained one more subscriber 😊... I've a very simple question for you --- what is disk here in spark? Is it driver disk or the hdfs disk ? In persist operation, we have disk and memory option. I understood that memory is executer memory but what is this disk 🙄? Could you please assist....Also, there is one more concept is data spilling to disk, i m badly confused with this disk😭

gyan_chakra

But what if we have to perform group by operation or join then we should have all data in ram for processing right?

jjayeshpawar

Can u help to understand how does spark decides----
How many tasks will run parallely for ex for 1 TB of file?
I am aware that no.of.tasks depends on no.of.cpu core assigned to executor. But how the calculation flows ..?

nehachopade

How to find why executor memory is growing gradually. The spark is installed on kubernetes and Driver memory is 4G and executor memory is 3G + 1G (overhead). Now How to check which memory area is growing more like execution or storage or why memory is growing. As after it reaches 99% executors are killed and there are no logs to check. Could you please suggest some pointers?

piyushpokharna

off heap memory and overhead memory are same?

suriyams

How 1TB is equals to 84 lakhs block, when each block is 128 MB.?

Amarjeet-fblk

when we use off-heap memory GC will not be used?

suriyams

Spark memory allocation and reading large files| Spark Interview Questions

Spark memory allocation and reading large files| Spark Interview Questions

Spark Executor Core & Memory Explained

Processing 25GB of data in Spark | How many Executors and how much Memory per Executor is required.

04. On-Heap vs Off-Heap| Databricks | Spark | Interview Question | Performance Tuning

Spark [Executor & Driver] Memory Calculation

95% reduction in Apache Spark processing time with correct usage of repartition() function

Apache Spark Memory Management | Unified Memory Management

spark out of memory exception

Spark Memory Allocation | Spark Performance Tuning

Spark Memory Management | Memory calculation | spark Memory tuning | spark performance optimization

Spark Interview Question | How many CPU Cores | How many executors | How much executor memory

Spark Out of Memory Issue | Spark Memory Tuning | Spark Memory Management | Part 1

Out Of Memory - OOM Issue in Apache Spark | Spark Memory Management | Spark Interview Questions

Spark Executor Memory Calculation | Number of Executors | Executor Cores | Spark Interview Q&A

Deep Dive: Apache Spark Memory Management

Spark Memory Management

22 - Spark Web UI - Executors tab

spark OutOfMemory( OOM) error

Most common filesystems used by apache Spark

Data engineer interview question | Process 100 GB of data in Spark Spark | Number of Executors

Spark Memory Management | How to calculate the cluster Memory in Spark

Spark Executor & Driver Memory Calculation | Dynamic Allocation | Interview Question

Spark Heap Memory, Off-heap memory #trending #programming #sparklovers

Spark Parallel and in-memory processing.