SQL Best Practices - Designing An ETL - Part 1

preview_player
Показать описание
Data engineering has many facets. One of the most common projects a data engineer takes on is developing an ETL pipeline from an operational DB to a data warehouse. Our team wanted to cover the overarching design of an ETL.

What are the typical principal components, stages, considerations, etc?
We started this by first writing Creating An ETL Part 1(more to come)

and we and now have worked on a video that is below that walks through the process. We wanted to discuss why each stage is important and what occurs when data goes from raw to stage, why do we need a raw database and so on.

Data engineering is a complex discipline that partners automation, programming, system design, databases, and analytics in order to ensure that analysts, data scientists and end-users have access to clean data.

This all starts with the basic ETL design.

We are practicing up for a webinar that we will be hosting on ETL development with Python and SQL. The webinar itself will be much more technical and dive much deeper into each component described in the video. However, we wanted to see how using the whiteboard was like in case we need it.

If you enjoyed this video, check out some of my other top videos.

The Ultimate Guide To Starting A Data Consulting Company In 2024 | Data Consulting 101

Data Camp Vs Data Quest - Which Data Engineering Course Is Best?

Looking to start you're own data engineering/analytics consulting company, then you should check out my new course here

If you'd like to read up on my updates about the data field, then you can sign up for our newsletter here.

Or check out my blog

Tags: Data engineering projects, Data engineer project ideas, data project sources, data analytics project sources, data project portfolio

_____________________________________________________________
_____________________________________________________________
About me:
I have spent my career focused on all forms of data. I have focused on developing algorithms to detect fraud, reduce patient readmission and redesign insurance provider policy to help reduce the overall cost of healthcare. I have also helped develop analytics for marketing and IT operations in order to optimize limited resources such as employees and budget. I privately consult on data science and engineering problems both solo as well as with a company called Acheron Analytics. I have experience both working hands-on with technical problems as well as helping leadership teams develop strategies to maximize their data.

*I do participate in affiliate programs, if a link has an "*" by it, then I may receive a small portion of the proceeds at no extra cost to you.
Рекомендации по теме
Комментарии
Автор

This is down to earth . I do appreciate your level of understanding and passing it on to me as a beginner

abnuru
Автор

By far one of the best i have seen on data mapping and stages of ETL...Thanks Ben! Keep them coming!!

yaqubhassan
Автор

Very interesting video, with a lot of ideas. Because of the title, the content of the video is bit different, I thought it was more focused in presenting the "players" to construct an ETL.
The idea of raw data seems to come more from the operational database than from the flat files (csv, etc), and after the cleaning/staging/mapping comes the Stage DB and then the Data warehouse.
I can't wait to see part 2 !

higiniofuentes
Автор

Is a Part 2 coming? I could not find it. Very good vid!

passais
Автор

Hi there.
Got a question on raw data (flatfiles).
These doesn't have identities or keys so you formulate a candidate key by combining some columns (product, location, target_year).
Here's the question, if there are some columns to correct and it belongs to the combination of candidate keys, how can the data be corrected or updated? What approach is to be made ?

EagleEye
Автор

As an financial auditor I want to extract data from our clients database and then manipulate it to have auditiformation. Is learning SQL language the best thing to do? Like to hear form you.

hakank.
Автор

What about removing nulls and malformed entries -- would you recommend doing that prior to Staging or afterwards?

lambdakicks
join shbcf.ru