Spark Scenario Interview Question | Persistence Vs Broadcast

Показать описание

#Spark #Persist #Broadcast #Performance #Optimization
Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more

About us:
We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.

Visit us :
Twitter :

Thanks for watching
Please Subscribe!!! Like, share and comment!!!!

Рекомендации по теме

Комментарии

This scenario was not clear when i go through external video but after your explanation i understood the difference. excellent

mrkrish

Thanks for explaining it in easy way :)

sumitgandhi

Data in memory and data in disk will not occupy same space. So if it is 12gb on disk in memory it can be 18gb

umamahesh

Is it approximately 12GB after persisting? Any significant overhead when the data is in memory?

gemini_

Hi viresh, Thanks for the video,
can you confirm the below statement,

In Persist executors save partition of the data frame in memory and in broadcast executor will save the entire data frame in memory?

HemanthKumar-cmlv

why here u r taking only 3 no. of executor?

hiteshpatil

In the case of broadcast, why do we have to include the 12GB of existing DF? I feel it is unfair to compare persist with broadcast. It is possible to avoid the 12GB?

gemini_

Hi Viresh, Thanks for this nice video....I believe that the broadcast variable is used to broadcast a small table and join it with a huge table which would avoid shuffling....What happens if we broadcast a table with more number of columns into the executors? Assuming that the broadcast table is larger in size because of having more number of columns.

mateen

I dont understand the premise of sending the whole dataset to each executor. you are defeating the purpose of spark which is distributing data over network.
Second thing is, if you clearly state what comparison is then this is really a straightforward task (I guess you forgot about garbage collection of original 12 Gb Data as well, correct me if I am wrong). I would be more interested in comparing data in transit comparison.
Lasty I think more challenging compare shuffle and broadcast operation.

xxx

why the number of partitions are three in case of broadcast join, Can we keep it low like 1or 2 orr why not to keep all the partitions into single executor

mahendramaurya

Hi Viresh, I am new to Spark so cut me some slack for asking newbie questions.

In persistence, the data frame is held either in memory or disc. Suppose the data from the data lake was held in executor memory which is in the given case is 4 GB and it is completely occupied.
Now I want to read another data frame, then how executor will deal with it since its memory is already occupied by the previous persisted data frame.

In Broadcast, the memory footprint is said to be 4 times the data frame memory. Where does this memory come from since each executor got only 4 GB? And I also read somewhere after Garbage Collection it is only left 3 times. Why is it so?

In persistence, the data is said to be stored in memory. Is it just the executor memory which is 4 GB or entire system memory?

Thanks in Advance.

siddhantpathak

Hey Viresh, from where do you study these concepts . please share resources

aneksingh

not clear ..you have explain about the persistence ..

anantababa

broadcast variable is copy per node right ? why it will 36 GB ?

suryasatish

Spark Scenario Interview Question | Persistence Vs Broadcast

Trending Big Data Interview Question - Number of Partitions in your Spark Dataframe

Top 50 PySpark Interview Questions & Answers 2024 | PySpark Interview Questions | MindMajix

49. Databricks & Spark: Interview Question(Scenario Based) - How many spark jobs get created?

Top 15 Spark Interview Questions in less than 15 minutes Part-2 #bigdata #pyspark #interview

Spark Interview Question | How many CPU Cores | How many executors | How much executor memory

10 frequently asked questions on spark | Spark FAQ | 10 things to know about Spark

10 recently asked Pyspark Interview Questions | Big Data Interview

Spark Scenario Based Question | Window - Ranking Function in Spark | Using PySpark | LearntoSpark

Data Engineer : Spark scenario question in accenture interview

Apache Spark Interview Questions And Answers | Apache Spark Interview Questions 2020 | Simplilearn

Spark Interview Question | Scenario Based Question | Multi Delimiter | LearntoSpark

Spark Interview Question | Scenario Based | Data Masking Using Spark Scala | With Demo| LearntoSpark

pyspark scenario based interview questions and answers | #pyspark | #interview | #data

Spark Scenario Based Interview Question | Missing Code

4 Recently asked Pyspark Coding Questions | Apache Spark Interview

Spark Scenario Interview Question | Persistence Vs Broadcast

Spark Scenario Based Question | Spark SQL Functions - Coalesce | Simplified method | LearntoSpark

Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark

10 PySpark Product Based Interview Questions

Comparing Lists in Scala | Spark Interview Questions | Realtime scenario

Spark Interview Question | Scenario Based Question | Explode and Posexplode in Spark | LearntoSpark

3. pyspark interview questions and answers for experienced | databricks interview question & ans...

Spark Interview Question | Scenario Based | Masking Data with Demo | LearntoSpark

Pyspark Scenario based interview questions,What is Broadcast hash join #BroadcastJoin #Pyspark