Data Engineering Interview | Apache Spark Interview | Live Big Data Interview

preview_player
Показать описание
This video is part of the Spark Interview Questions Series.
A lot of subscribers has requested me to give some experience on how an actual Big Dta Interview look like. In This Video we have covered what usually happens in Big data or data engineering interview happens.
There will be more videos covering different aspects of Data Engineering Interviews.

Here are a few Links useful for you

If you are interested to join our community. Please join the following groups

You can drop me an email for any queries at

#apachespark #sparktutorial #bigdata
#spark #hadoop #spark3 #bigdata #dataengineer
Рекомендации по теме
Комментарии
Автор

Questions :
1) Why you shifted from map reduce development to spark development?
2) How Spark Engine is different from Hadoop Map Reduce engine?
3) What are the steps for spark jobs optimization?
4) What is executor and executor core? Reference in terms of process & threads
5) How to you identify that your hive script is slow?
6) When do we use partitioning and bucketing in hive?
7) Small file problem in hive ? ---> Skewiness
8) How do you improve high cardinality issue in dataset? In resect of Hive.
9) How do you care code merging with other teams, explain your development process?
10) Again, Small files issue in Hadoop ?
11) Metadatasize of hadoop ?
12) How spark is differentiated from Map Reduce?
13) In a class having 3 fields name, age, salary & you are creating series of objects from this class? How do you compare the object ----(I didn't got the question exactly)
14) Scala : what is === in joins conditions? What does it means?

Hope so it will help?

tradingtransformation
Автор

I must really appreciate for posting this interview in public domain. This is a really good one.. it would be really great to see a video on process to optimize the job

bramar
Автор

really great video, it would have been much greater, if you can answer the questions which the candidate was not able to answer, like what are symptoms of a job, on which you will decide that you should increase the number of executors or memory per executors. Can anyone please answer here, so that it may be beneficial for candidates. Thanks a lot for this video.

tradingtexi
Автор

@Data Savvy
It can be watched at one stretch. Really helpful. 👍🏻🙌🏻

amansinghshrinet
Автор

since this is mock interview at the end the interviewers should hv given feedback in the call itself so its helpful for viewers

ajithkannan
Автор

Wish I didn't have the haircut that day😂😂😀😀😂😂😂

arindampatra
Автор

Awesome Harjeet sir!!
I can even watch such thousand videos at a stretch😁
Very informative!!!
Can't wait for long, please upload as much as u can sir.

rohitrathod
Автор

Ur interview is very helpful.
Keep up the good work 👍👍👍

chaitanya
Автор

hadoop is meant for handling big files in small numbers and also small file problem arises when file size is less than HDFS block size [ 64 or 128 ] . Moreover, handling bulk number of small files may increase pressure on Name node, if we have option to handle big file. so in hadoop file size matters alot so only Partitioning and Buckting came into picture. correct me if i did mistake

kaladharnaidusompalyam
Автор

Hello, I was asked the followed questions in a AWS developer interview-
Q1. We have *sensitive data* coming in from a source and API. Help me design a pipeline to bring in data, clean and transform it and park it.
Q2. So where does pyspark come into play in this?
Q3. Which all libraries will you need to import to run the above glue job?
Q4. What are shared variables in pyspark
Q5. How to optimize glue jobs
Q6. How to protect sensitive data in your data.
Q7. How do you identify sensitive information in your data.
Q8. How do you provision a S3 bucket?
Q9. How do I check if a file has been changed or deleted?
Q10. How do I protect my file having sensitive data stored in S3?
Q11. How does KMS work?
Q12. Do you know S3 glacier?
Q13. Have you worked on S3 glacier?

Anonymous-feep
Автор

This is very useful. Please make more videos like this.

kranthikumarjorrigala
Автор

Nice video. The purpose of using '===' while joining is to make sure that we are comparing right values (join key value) and right data type as well. Please correct me if my understanding is wrong.

kiranmudradi
Автор

Default block size is 128MB, when small size files will be created using partitioning. Lot of storage will go waste. And required horizontal Scaling ( fails the purpose of distribution)

ShashankGupta
Автор

Awesome video. Thank you for putting this out. It's helpful.

sukanyapatnaik
Автор

Thank you very much, this is very useful!!!

sujaijain
Автор

Very Informative.. Thanks a lot Guys...

rahulpandit
Автор

Keep up the excellent work👍 expecting more such videos.

sathyansathyan
Автор

amazing job, really interesting thank you for sharing this interview with us.

AhlamLamo
Автор

Very 2 helpful nd plz have 1, 2 more interviews of same level.
Great effort by interviewer and interviewee.

MoinKhan-cgcu
Автор

It would be really helpful if you could make more such a mock interviews. I think we have only 3 live interviews yet on channel

shubhamshingi