filmov
tv
How to test your Python ETL pipelines | Data pipeline | Pytest
Показать описание
In this tutorial we are going to cover how to test ETL pipelines. I have received a number of inquiries on the testing and especially testing the data pipelines we build using python. Testing is an important aspect of ETL pipelines. It ensures we are delivering accurate information to our stakeholders. We want to make sure our data is current, consistent and accurate.
Therefore, it is always a good idea to put test cases in place to catch data anomalies. A failing test can tell us that;
• An assumption about your source data is incorrect. For example, a column we expected never to be null contains nulls or a column we expected to contain unique values contains duplicates.
• Testing can catch the flaws in our transformation logic.
Errata in the tests: One of the viewers pointed that the null check was always returning true. It has been revised to to return false when nulls are present. test_null_check function is updated as follow:
def test_null_check(df):
assert df['ProductKey'].notnull().all()
#pytest #etl #python
Subscribe to our channel:
---------------------------------------------
Follow me on social media!
---------------------------------------------
Topics covered in this video:
0:00 - Introduction to ETL testing
0:56 - Benefit of testing
1:32 - Pytest testing library overview
2:26 - Pytest setup
3:05 - Import Data
3:36 - First test - column check
6:08 - Primary key column tests
7:22 - Pytest features
8:15 - Data Type check
9:36 - Expected Values check
Therefore, it is always a good idea to put test cases in place to catch data anomalies. A failing test can tell us that;
• An assumption about your source data is incorrect. For example, a column we expected never to be null contains nulls or a column we expected to contain unique values contains duplicates.
• Testing can catch the flaws in our transformation logic.
Errata in the tests: One of the viewers pointed that the null check was always returning true. It has been revised to to return false when nulls are present. test_null_check function is updated as follow:
def test_null_check(df):
assert df['ProductKey'].notnull().all()
#pytest #etl #python
Subscribe to our channel:
---------------------------------------------
Follow me on social media!
---------------------------------------------
Topics covered in this video:
0:00 - Introduction to ETL testing
0:56 - Benefit of testing
1:32 - Pytest testing library overview
2:26 - Pytest setup
3:05 - Import Data
3:36 - First test - column check
6:08 - Primary key column tests
7:22 - Pytest features
8:15 - Data Type check
9:36 - Expected Values check
Комментарии