Spark Scenario Based Interview Question | Missing Code

Показать описание

#Apache #BigData #Spark #Shuffle #Stage #Internals #Performance #optimisation #DeepDive #Join #Shuffle:

Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more

#Spark #Persist #Broadcast #Performance #Optimization
Please join as a member in my channel to get additional benefits like materials in BigData , Data Science, live streaming for Members and many more

About us:
We are a technology consulting and training providers, specializes in the technology areas like : Machine Learning,AI,Spark,Big Data,Nosql, graph DB,Cassandra and Hadoop ecosystem.

Visit us :
Twitter :

Thanks for watching
Please Subscribe!!! Like, share and comment!!!!

Рекомендации по теме

Комментарии

I am bit late.The same question I faced 2 months ago.Thanks for your wonderful video.

RaviKumar-uuro

Thanks for sharing this. Good explanation. Just one pointer. DAG stands for Directed Acyclic graph and not Dynamic Aclyclic graph.

pritambanerjee

intersting question..waiting for some more..nice work

jaisingh-lbfp

The 2nd map will not executed as no action performed on result data set after collect.

jalsacentre

How the last map operation will run on driver see till collect a job will be completed and whenever we call another action it will create new job with new Dag which will again distributed and run on executors??

shivankchaturvedi

Use case which I heard:-- In a textfile there are billion number of lines..Task is to search for a particular word..If that word is found then we will stop searching word and move to next step..Untill word is found we need to continue searching..Need to know best optimized way for this use case..thanks in advance :)

rakeshadhikari

What if after calling an action, dataset that comes to driver node is too huge to be accomodated on driver node?what will happen then?

GAURAVGUPTA-zubu

If collect operation performs in an RDD It will give results in the form of the List, Map and single object, not RDD. So How would possible to apply map transformations in the collect action resultant data.

ProgrammingCrag

How to decide number of buckets in hive ???? Is there any formula to calculate ???? Please explain...by taking any example...or provide any link....

Raghav

What If both driver and worker node installed in Same node??

Karmihir

Can it possible data loss in spark sql if lots of join is there sum function is not working on two server with same application are different ?

ankurrunthala

Could you please help me in below query.
.suppose I need to create a application where I need to load multiple files (CSV) using DF , if any of the file structure ids different from our defined structure(we defined in beginning) then we need to redirect those files into some error folder and load the file only with correct structure..how will we achieve this in spark.

Ex . file columns should be id, name, roll but in some files id, name, city, subject.. i need to load the files which are having id, name, roll columns only

Raghav

here, the variable, Result is a collection and no longer an RDD. are you sure this code would work at the last line? it is attempting to do an RDD map transformation on the collection, Result.

gautampram

Spark Scenario Based Interview Question | Missing Code

Spark Scenario Based Interview Question | Missing Code

Spark Interview Question | Scenario Based Question | Multi Delimiter | LearntoSpark

49. Databricks & Spark: Interview Question(Scenario Based) - How many spark jobs get created?

Spark Interview Question | How many CPU Cores | How many executors | How much executor memory

Trending Big Data Interview Question - Number of Partitions in your Spark Dataframe

Spark Scenario Based Question | Window - Ranking Function in Spark | Using PySpark | LearntoSpark

pyspark scenario based interview questions and answers | #pyspark | #interview | #data

Spark Scenario Based Interview Question | out of memory

Interview Preparations and Techniques

Spark Scenario Based Question | Spark SQL Functions - Coalesce | Simplified method | LearntoSpark

10. Solve using regexp_extract method |Top 10 PySpark Scenario-Based Interview Question| MNC

Spark Scenario Based Question | SET Operation Vs Joins in Spark | Using PySpark | LearntoSpark

Spark Interview Question | Scenario Based | Data Masking Using Spark Scala | With Demo| LearntoSpark

10 frequently asked questions on spark | Spark FAQ | 10 things to know about Spark

Spark Interview Question | Scenario Based | Map Vs FlatMap | LearntoSpark

Spark Interview Question | Scenario Based | Multi Delimiter | Using Spark with Scala | LearntoSpark

Spark Interview Question | Scenario Based Question | Explode and Posexplode in Spark | LearntoSpark

10 PySpark Product Based Interview Questions

Comparing Lists in Scala | Spark Interview Questions | Realtime scenario

Pyspark Scenario based interview questions,What is Broadcast hash join #BroadcastJoin #Pyspark

How Sort and Filter Works in Spark | Spark Scenario Based Question | LearntoSpark

Data engineer interview question | Process 100 GB of data in Spark Spark | Number of Executors

10 recently asked Pyspark Interview Questions | Big Data Interview

Spark Interview Question | Scenario Based Questions | { Regexp_replace } | Using PySpark