How to test your Python ETL pipelines | Data pipeline | Pytest

preview_player
Показать описание
In this tutorial we are going to cover how to test ETL pipelines. I have received a number of inquiries on the testing and especially testing the data pipelines we build using python. Testing is an important aspect of ETL pipelines. It ensures we are delivering accurate information to our stakeholders. We want to make sure our data is current, consistent and accurate.
Therefore, it is always a good idea to put test cases in place to catch data anomalies. A failing test can tell us that;
• An assumption about your source data is incorrect. For example, a column we expected never to be null contains nulls or a column we expected to contain unique values contains duplicates.
• Testing can catch the flaws in our transformation logic.

Errata in the tests: One of the viewers pointed that the null check was always returning true. It has been revised to to return false when nulls are present. test_null_check function is updated as follow:

def test_null_check(df):
assert df['ProductKey'].notnull().all()

#pytest #etl #python

Subscribe to our channel:

---------------------------------------------
Follow me on social media!

---------------------------------------------

Topics covered in this video:
0:00 - Introduction to ETL testing
0:56 - Benefit of testing
1:32 - Pytest testing library overview
2:26 - Pytest setup
3:05 - Import Data
3:36 - First test - column check
6:08 - Primary key column tests
7:22 - Pytest features
8:15 - Data Type check
9:36 - Expected Values check
Рекомендации по теме
Комментарии
Автор

Heart felt thanks to you for all these recorded sessions/tutorials .. you have made life so simple.

teoymou
Автор

The best data engineering YouTuber I've had the pleasure to find. Thanks and please keep it up!

willosullivan
Автор

Articulate explanation!You’re the Best!!Thank you so much .

poojaak
Автор

You did a great job. I was looking same material for long time. Thanks man for sharing great content.
I have many questions on pytest, will ask many questions once I go through all videos . Thanks

Sreenu
Автор

The best data engineering YouTuber Thank you

farhadshakibaca
Автор

Great and very helpful Content. Thank you.

soheilahg
Автор

Thank you for a great tutorial!
You already have few different videos, can you add a number(to order them) to each tutorial it can help which video is the first and which one is the last.

gulnarabekirova
Автор

could you please do this with apache beam…. jdbc source to Bigquery …. or you help me in this… i really need this kind of information

ashishvats
Автор

Thanks for this video, is there a video on how to do these runs on SQL server, pgadmin or Athena ?

MyChannel-nsct
Автор

Thanks for such important info.
How to automate these test cases?

bharamkarvivek
Автор

How to add a logger to it with Tqdm progress bar

SP-dbsh
Автор

Please make video on etl automation testing from scratch and make seperate playlists

kiranpatil
Автор

Function test_null_check(df) will always return passed

lalalf
Автор

def test_Genre_dtype_str(df):
assert (df["Genre"].dtype == str or df["Genre"].dtype == 'O')
This test case is always returned Pass

dmunagala