Data engineer interview question | Process 100 GB of data in Spark Spark | Number of Executors

preview_player
Показать описание
In this video, we have discussed how to process 100 GB of data in spark. This is one the famous question asked during interview for data engineering role.

For more queries reach out to me on my below social media handle.

My Gear:-

My PC Components:-
Рекомендации по теме
Комментарии
Автор

You are actually filling the gap.. much thanks man..!!

request you to kindly make this kind of interactive videos specially on below topics -

1. Repartition with real time scenario. How to determine repartition size depending on data size, cluster size

2. Key salting method - practical/real time case with coading example

3. Data serialization in spark and how it helps on optimization

4. Choosing file type on different scenario (parquet/json/orc)

5. DAG analysis

6. Accumulator - with real time use cases

7. Cache and persist - when to use what

8. garbage collection tuning

9. Real time coding issues faced by data engineers and debugging

10. Version control system for databricks notebook

11. Real time production implementation of bigdata projects..

12. How to perform unit testing for databricks notebooks?

Thanks in advance.. ❤❤

neelbanerjee
Автор

Thank you Manish Bhai, you understand what matters to the aspiring data engineers and what they need to know in depth. really appreciate this.

sohelsayyad
Автор

Mihir is just bluffing and saying the generic stories. Manish did a good job by interrupting hime. Keep it up.

pratikj
Автор

The interviewer asked me about processing PETABYTES of data. Can you explain how to deal with that scenario

rh
Автор

Thanks manish for this informative session.. I already had this question in my mind.. I was searching for this question from few days...finally Today you made this video...its like a magic...Thnks a lot man...Please make more videos on such questions which are asked in interview.

mranaljadhav
Автор

Hi ​ @MANISH KUMAR

As per Mihir first approach >> 4:03
he is considering 5 executors with 2 cores each and 10gb memory/executor. In this case,
5*2 = 10 cores in total(10 parallel processes) and
10gb* 5 = 50 gb in total memory
I think 5 executors with the above mentioned configuration will not handle 100gb of data. It can only handle 50 gb.

Correct me if I am wrong.

The calculation mentioned at the end >> 10:53
5 to 6 executors and 4 cores each and 15gb ram/executor seems fine.

FlashGG
Автор

Yeh channel ki reach m aag lagne vali hai bahut tej, kaafi tez upr uthega yeh. Likh ke lelo.

siddharthguliyani
Автор

memories are not calculated as guess work what he is doing in interview. There is a proper formula to calculate no of executors, cors and memory

kyou
Автор

Bahi data engineering field me remote jobs
Bhi he ??us Remoye jobs?

asktostranger
Автор

how much data structure needed for data engineer and how to learn plz make video on this topic...

jaychavhan
Автор

Thank you very much Manish for your guidance, it is really helpful i am ur new subscriber, my query is, I am good at python developer and intermediate SQL i know, but very much new to spark, i had learnt the spark basics, but can you suggest me one course from where I can learn like this real time questions on spark to process 100 GB data is there any resources in udemy or any other places Thanks in advance, as if i want to career change from python developer to data Engineer

mnaveenvamshi
Автор

great video ...Can u make a video on What projects should fresher make for Data Engineer role ?

rishav
Автор

This is not a correct approach i believe . To process 100 gb of data, block size created would be 800 . We would need more executors to run in parallel . If we rely on the resources explained, it will take much more time than expected.

RaviKumar-gvwo
Автор

I don't think he answered question correctly and he was confident before you asking some doubt, but I think interviews me ye jawab nahi chalega, because he was moving his answers around resources, and business and all. BUt that was not asked.

@Manish bhai please, would like to know your approach to this question with calculations

ameygoesgaming
Автор

One yr study krke dada Engineer ban sakte hai kya sir...?

girishnigade
Автор

Can someone the post the content in English too

anupamakamepalli
Автор

Where to learn these in depth spark architecture... Any resources/book you'll suggest ?

manish
Автор

Solution dene se zyada bandaa bakaiti kar rha hai

adityakishan