Azure Cloud Data Engineer Mock Interview | Important Questions asked in Big Data Interviews| Pyspark

preview_player
ะŸะพะบะฐะทะฐั‚ัŒ ะพะฟะธัะฐะฝะธะต

I have trained over 20,000+ professionals in the field of Data Engineering in the last 5 years.

๐–๐š๐ง๐ญ ๐ญ๐จ ๐Œ๐š๐ฌ๐ญ๐ž๐ซ ๐’๐๐‹? ๐‹๐ž๐š๐ซ๐ง ๐’๐๐‹ ๐ญ๐ก๐ž ๐ซ๐ข๐ ๐ก๐ญ ๐ฐ๐š๐ฒ ๐ญ๐ก๐ซ๐จ๐ฎ๐ ๐ก ๐ญ๐ก๐ž ๐ฆ๐จ๐ฌ๐ญ ๐ฌ๐จ๐ฎ๐ ๐ก๐ญ ๐š๐Ÿ๐ญ๐ž๐ซ ๐œ๐จ๐ฎ๐ซ๐ฌ๐ž - ๐’๐๐‹ ๐‚๐ก๐š๐ฆ๐ฉ๐ข๐จ๐ง๐ฌ ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ!

"๐€ 8 ๐ฐ๐ž๐ž๐ค ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ ๐๐ž๐ฌ๐ข๐ ๐ง๐ž๐ ๐ญ๐จ ๐ก๐ž๐ฅ๐ฉ ๐ฒ๐จ๐ฎ ๐œ๐ซ๐š๐œ๐ค ๐ญ๐ก๐ž ๐ข๐ง๐ญ๐ž๐ซ๐ฏ๐ข๐ž๐ฐ๐ฌ ๐จ๐Ÿ ๐ญ๐จ๐ฉ ๐ฉ๐ซ๐จ๐๐ฎ๐œ๐ญ ๐›๐š๐ฌ๐ž๐ ๐œ๐จ๐ฆ๐ฉ๐š๐ง๐ข๐ž๐ฌ ๐›๐ฒ ๐๐ž๐ฏ๐ž๐ฅ๐จ๐ฉ๐ข๐ง๐  ๐š ๐ญ๐ก๐จ๐ฎ๐ ๐ก๐ญ ๐ฉ๐ซ๐จ๐œ๐ž๐ฌ๐ฌ ๐š๐ง๐ ๐š๐ง ๐š๐ฉ๐ฉ๐ซ๐จ๐š๐œ๐ก ๐ญ๐จ ๐ฌ๐จ๐ฅ๐ฏ๐ž ๐š๐ง ๐ฎ๐ง๐ฌ๐ž๐ž๐ง ๐๐ซ๐จ๐›๐ฅ๐ž๐ฆ."

๐‡๐ž๐ซ๐ž ๐ข๐ฌ ๐ก๐จ๐ฐ ๐ฒ๐จ๐ฎ ๐œ๐š๐ง ๐ซ๐ž๐ ๐ข๐ฌ๐ญ๐ž๐ซ ๐Ÿ๐จ๐ซ ๐ญ๐ก๐ž ๐๐ซ๐จ๐ ๐ซ๐š๐ฆ -

BIG DATA INTERVIEW SERIES

This mock interview series is launched as a community initiative under Data Engineers Club aimed at aiding the community's growth and development

Link of Free SQL & Python series developed by me are given below -

Don't miss out - Subscribe to the channel for more such informative interviews and unlock the secrets to success in this thriving field!

Social Media Links :

TIMESTAMPS : Questions Discussed
00:50 Introduction
02:10 What sources do you use for data ingestion?
02:25 What connectors do you use for data ingestion?
02:45 How do you store and transform data after ingestion?
03:58 How are you preprocessing the data?
04:41 How do you eliminate duplicate records?
05:12 How do you ensure the correct records when handling duplicates?
05:50 How is your storage layer designed? Do you use mounting techniques?
06:04 Do you use delta files? Why?
07:00 What optimization techniques have you implemented?
08:05 Do you use partitions?
08:24 What factors do you consider when partitioning?
09:11 Do you use bucketing?
09:36 What are the use cases for partitioning and bucketing?
10:33 Besides broadcast joins, what other joins do you use?
10:52 Which join is the most efficient?
11:50 What is the difference between narrow and wide transformations?
12:26 What is your understanding about Spark and Databricks?
13:22 How do you consume data from the gold layer?
14:42 How do you connect Power BI to Azure Synapse?
15:46 Can you outline Spark architecture?
17:07 What is a DAG?
18:15 What is the difference between client mode and cluster mode?
19:29 Have you faced any challenges with cluster mode?
20:50 Why do DataFrames and Datasets exist?
22:17 What do you understand by normalization?
22:51 What other optimization techniques do you use?
23:33 SQL query

Music track: Retro by Chill Pulse
Background Music for Video (Free)

Tags
#mockinterview #bigdata #career #dataengineering #data #datascience #dataanalysis #productbasedcompanies #interviewquestions #apachespark #google #interview #faang #companies #amazon #walmart #flipkart #microsoft #azure #databricks #jobs
ะ ะตะบะพะผะตะฝะดะฐั†ะธะธ ะฟะพ ั‚ะตะผะต
ะšะพะผะผะตะฝั‚ะฐั€ะธะธ
ะะฒั‚ะพั€

Source type, project discussion
Handling duplicates
Delta lake feature
Spark vs dbx
Power bi connect to synapse
Spark architecture
Dag
Client mode vs cluster mode
Df vs dataset
Normalisation
2nd highest salary in dep

hdr-tech
ะะฒั‚ะพั€

When someone saying they are optimizing the code in databricks..all are faking๐Ÿ˜‚๐Ÿ˜‚.
Spark itself optimize your code using catalytst optimizer/Spark sql engine and after spark 3.0 when Adaptive Query Execution(AQE) introduced it also optimized join during run time and we can alter the broadcast threshold which is part of admin team during databricks cluster creation

The only things didnt impact by above two is those things stored inside user defined memory like udfs and low level programming on rdd ops which now a days no one doing in databricks.last one is caching manually also

gudiatoka
ะะฒั‚ะพั€

seems rataa laga kar aaya hai bhai :).... anyway he answered all questions very well

rainbowhappy