Data Engineering Interview

preview_player
Показать описание
Big Data Mock Interview

Join Nisha, an experienced Senior Data Engineer and Xian for an exciting and informative Data Engineering mock interview session.

If you're preparing for a Data Engineering interview, this is the perfect opportunity to enhance your skills and increase your chances of success. The mock interview simulates a real-life interview scenario and provides valuable insights and guidance. The topics covered include #apachespark #SQL, ETL pipelines, data modelling, database technologies, cloud platforms, CI/CD and more. You'll get to see how professionals tackle technical questions and problem-solving challenges in a structured and efficient manner.

By watching this mock interview, you'll learn effective strategies to approach technical questions and problem-solving scenarios, gain familiarity with the data engineering interview process and format, enhance your communication skills and ability to articulate your thoughts clearly, identify areas of improvement, receive expert feedback on your performance, boost your confidence, and reduce nervousness for future interviews.

This mock interview suits all levels of experience, whether you're a fresh graduate, a career changer, or a seasoned professional looking to improve your interview skills. Don't miss out on this invaluable learning experience! Subscribe to our channel and hit the notification bell to be notified when the mock interview is released. Stay tuned for a deep dive into the world of data engineering.

𝙐𝙨𝙚𝙛𝙪𝙡 𝙇𝙞𝙣𝙠𝙨:

Subscribe now and be the first to watch the Big Data Mock Interview with Nisha & Xian

🔅 Xiandong (Interviewee)'s LinkedIn profile -

Chapters:


#dataengineering #interview #interviewquestions #bigdata #mockinterview #awss3 #clouds #pyspark #sql #snowflake n #apachespark #aws
Рекомендации по теме
Комментарии
Автор

The interviewer was brutal asking about IAM and connecting to S3 lol

danielandrews
Автор

here are some answers to questions that I think the candidate didn't answer correctly.
Q- to avoid duplication when ETL job is rerun - use incremental loads. or use staging tables, when you load data into the staging tables run deduplication steps.
Q- If a query is taking too much time and resources - check the ddl of the tables to analyze the indexes, and see if the filtering is based on indexes. Use explain plan to check if full table scans are being implemented. if so, try running query in such a way that index based scans are used....There are cerrtain joins like cross joins or full outer joins that causes the query to run slow.

Please correct me

shomailnajeeb
Автор

Design pipeline of historical and then implement CDC question is a very good question. I did not find any resources where I could get such design questions. It's this type of data design questions, also questions related to given an api you have to do some manipulation in existing dataset in spark. Spark optimization, SQL are all readily available elsewhere.
If you could have more such questions in upcoming videos, it would be very helpful for FAANG prep.

AnirudhaJoshi-jp
Автор

Can't we check the unique id before inserting the data?

PranavSaw
welcome to shbcf.ru