Big Data Engineering Mock Interview | Big Data Pipeline | AWS Cloud Services | Project Architecture

preview_player
ะŸะพะบะฐะทะฐั‚ัŒ ะพะฟะธัะฐะฝะธะต

I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.

๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ!

"๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ."

๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ -

30 INTERVIEWS IN 30 DAYS- BIG DATA INTERVIEW SERIES

This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development

Link of Free SQL & Python series developed by me are given below -

Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!

Social Media Links :

Discussed Questions : Timestamp
2:34 Brief overview of projects.
3:19 Describe your data pipeline flow and architecture.
5:10 What transformations do you use, and in which format do you write data to Redshift?
6:44 How do you handle null values?
9:03 Which file format do you use for end-user data?
9:50 Why is Parquet preferred over ORC?
11:10 What are the join types in Hive?
12:07 Which types of joins are used to avoid shuffling in Hive and PySpark? Do you know the specific term?
12:53 Explain how broadcast join avoids shuffling.
14:07 Which property controls broadcast join in Spark?
14:40 How do you start a Spark application in PySpark?
16:09 What does the builder do in Spark session creation?
17:43 What are the partitioning types in Hive?
18:36 Difference between managed and external tables in Hive.
19:16 Have you performed Spark performance tuning?
19:36 Difference between repartition and coalesce in Spark?
20:25 Have you used NoSQL databases?
21:02 SQL coding question

Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs
ะ ะตะบะพะผะตะฝะดะฐั†ะธะธ ะฟะพ ั‚ะตะผะต
ะšะพะผะผะตะฝั‚ะฐั€ะธะธ
ะะฒั‚ะพั€

parquet is a columnar based storage format, so it is a very good file format in terms of retrieving the data through the query. It definitely reduces the usage of i/o read and network bandwidth. Besides that it has built in support for compression in the form of snappy format. So it reduces the space usgae. Another one I can think of is, parquet files comes up a structure with 3 components, they are header, body and footer. Heder actually the name of the file(part001, part002). Body is actual data content which it is storing and footer is basically for the metadata. This metadata includes the minimum and maximum values of the columns. So whenever we try to query the data which is stored in parquet format this metadata helps us for the data skipping which in turn fast our query execution. Hope it helps.

imranhossain
ะะฒั‚ะพั€

One of the most Informative interview I ever Watched. Big Shout Out to Satinder Singh as he explained topic clear and most understandable way. Thank You.

MANGESHpawarsm
ะะฒั‚ะพั€

Hi Sumit Sir,
In the first sql problem where we are required to find subject wise toppers, one case where row_number() will fail is when we have two top-scorers with the same marks in a specific subject. Please check the example below:
student_name, subject, marks (-- derived column)
stud_1, maths, 90 -- 1
stud_2, maths, 90 -- 1
stud_1, economics, 95 --1
stud_2, economics, 90 -- 2
stud_3, economics, 88 -- 3
Instead of row_number(), we can choose any one from rank or dense_rank as we just need the first rankers(based on highest marks scored in each subject). My approach will be as follows:
WITH top_scorers AS
(
SELECT student_name,
subject,
marks,
DENSE_RANK() OVER(PARTITION BY subject ORDER BY marks DESC) AS rnk
FROM student_marks
)

SELECT student_name,
subject,
marks
FROM top_scorers
WHERE rnk = 1;

grim_rreaperr
ะะฒั‚ะพั€

Best Interview I ever seen. Both of you too good at your level.

KiyanshLife
ะะฒั‚ะพั€

The interview was more focused on pyspark, sql we expect interviewer to ask more qns on AWS cloud as well. Because in most of the interview videos posted pyspark has been asked a lot.If qns on AWS would have been asked it would have been very helpful.

mohammedalikhan
ะะฒั‚ะพั€

This interview is really great as Satinder explained some concepts like property for broadcast etc more clearly. Thanks Sumit Sir!! Expecting more videos like this..

sruthiselvakumar
ะะฒั‚ะพั€

Thank you so much satindar sir its very informative and useful while giving interview excellent.

sunitasolankar
ะะฒั‚ะพั€

This was a good interview. Different from the earlier one's. Satinder's question and advice was very good.

SreemantaKesh
ะะฒั‚ะพั€

23:05 use dense rank instead of row number because may be more than one student have same highest number in same subject

Hope-xbjv
ะะฒั‚ะพั€

This was a good interview and Satinder has good experience as an interviewer.

abhishekmodak
ะะฒั‚ะพั€

Very Informative one of the best mock interview with proper answering and details

safarnama
ะะฒั‚ะพั€

Sir i personaly want to see satinder sirs more interviews ๐Ÿ˜Š

AliKhanLuckky
ะะฒั‚ะพั€

Very informative video, liked the point of view by Satinder Sir.

abhishekkmalik
ะะฒั‚ะพั€

Satinder sir is awesome, always something to learn from his questions.

tanujarora
ะะฒั‚ะพั€

Thanks for uploading such a great Interview video Sir!

DesireIsIrrelevant
ะะฒั‚ะพั€

Interview was insightful. Learnt core concepts of spark from Satinder

Sagar
ะะฒั‚ะพั€

What's the difference between parquet and delta format?

sabyspeaksonline
ะะฒั‚ะพั€

Aditya - u need to be strong in the basics and always answer straight forward and crisply on points . Donโ€™t beat the bush

ashwenkumar
ะะฒั‚ะพั€

Itโ€™s really helpful sir. Thank you so much

DataJourneyHuub
ะะฒั‚ะพั€

Sir please continue python course along with this ๐Ÿ™

Abhishek-