Spark Scenario Based Interview Question | Missing Code

preview_player
Показать описание
#Apache #BigData #Spark #Shuffle #Stage #Internals #Performance #optimisation #DeepDive #Join #Shuffle:

Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more

#Spark #Persist #Broadcast #Performance #Optimization
Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more

About us:
We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.

Visit us :
Twitter :

Thanks for watching
Please Subscribe!!! Like, share and comment!!!!
Рекомендации по теме
Комментарии
Автор

I am bit late.The same question I faced 2 months ago.Thanks for your wonderful video.

RaviKumar-uuro
Автор

Thanks for sharing this. Good explanation. Just one pointer. DAG stands for Directed Acyclic graph and not Dynamic Aclyclic graph.

pritambanerjee
Автор

intersting question..waiting for some more..nice work

jaisingh-lbfp
Автор

The 2nd map will not executed as no action performed on result data set after collect.

jalsacentre
Автор

How the last map operation will run on driver see till collect a job will be completed and whenever we call another action it will create new job with new Dag which will again distributed and run on executors??

shivankchaturvedi
Автор

Use case which I heard:-- In a textfile there are billion number of lines..Task is to search for a particular word..If that word is found then we will stop searching word and move to next step..Untill word is found we need to continue searching..Need to know best optimized way for this use case..thanks in advance :)

rakeshadhikari
Автор

What if after calling an action, dataset that comes to driver node is too huge to be accomodated on driver node?what will happen then?

GAURAVGUPTA-zubu
Автор

If collect operation performs in an RDD It will give results in the form of the List, Map and single object, not RDD. So How would possible to apply map transformations in the collect action resultant data.

ProgrammingCrag
Автор

How to decide number of buckets in hive ???? Is there any formula to calculate ???? Please explain...by taking any example...or provide any link....

Raghav
Автор

What If both driver and worker node installed in Same node??

Karmihir
Автор

Can it possible data loss in spark sql if lots of join is there sum function is not working on two server with same application are different ?

ankurrunthala
Автор

Could you please help me in below query.
.suppose I need to create a application where I need to load multiple files (CSV) using DF , if any of the file structure ids different from our defined structure(we defined in beginning) then we need to redirect those files into some error folder and load the file only with correct structure..how will we achieve this in spark.


Ex . file columns should be id, name, roll but in some files id, name, city, subject.. i need to load the files which are having id, name, roll columns only

Raghav
Автор

here, the variable, Result is a collection and no longer an RDD. are you sure this code would work at the last line? it is attempting to do an RDD map transformation on the collection, Result.

gautampram