How to test your Data Pipelines with Great Expectations

preview_player
Показать описание
In this video we are going to cover another library for Data Quality testing. Previously we have used PyTest to carry out data quality tests. With PyTest we write our own functions to perform testing. The Expectation library has built-in functions to carry out the data quality tests.
With Great Expectations, you can assert what you expect from the data you load and transform, and catch data issues quickly – Expectations are basically unit tests for your data. Great Expectations also creates data documentation and data quality reports from those Expectations.

#dataquality #Python #greatexpectations

💥Subscribe to our channel:

📌 Links
-----------------------------------------
#️⃣ Follow me on social media! #️⃣

-----------------------------------------

Topics in this video (click to jump around):
==================================
0:00 Introduction Great Expectations
0:38 Notebook & Data Import
1:08 Convert to Great Expectations DataFrame
1:41 Run your First Data Quality Test
2:39 Primary Key Tests; Column Exists, Unique, Null & Data Type
3:56 Test Values in Set
5:09 Test Values in Range
7:07 Save Tests for re-use
7:33 Re-Use Tests
Рекомендации по теме
Комментарии
Автор

this is the best basic tutorial of this tool i've been able to find, you have everything one would need to start, in a digestable way. thanks

fcastellano
Автор

this is gradually becoming my favourite channel.

bralabala
Автор

I love your videos man
Try publishing them more on subreddits like r/datascience and r/dataengineering

nizar
Автор

This is another master peace video.
I am struggling on below scenarios, if possible could you please explain in your upcoming videos

1. How to read latest file . Suppose my source folder contain many files, i want read only latest file

2. I want create a Python script to read, process and load the data into db table when file arrived in source folder

Sreenu
Автор

Thanks for the video. What if I have composite primary key will the work?

Its failing for me. so could you help me on this?

dileepkumar-dkkc
Автор

Can you please make a video on how to create custom expectation using query or anything. Then how to apply that for DQ

swagatdash
Автор

Is there a way to create the rules/tests automatically? Is there a better way to visualise a summary of tests?

palermodpr
Автор

From where we can refere the videos related to pytest framwork set up to validate data kindly help with the video link?

vishalkoundal
Автор

Hi, Can we test csv file data with database table with some expectation?

VB-rffv
Автор

Good explanation.
But you need to update the video. "get_expectations_config()" is no longer available in great expectations framework. I used your code for running my checks, and it failed in the "get_expectations_config()" step.

sirajansari
Автор

Please create end to end python projects for Data Analyst

balakrishnaprasad
Автор

Why every tutorial is using local client; its a very unusual case because in real world you have to load all the data into memory to execute this quality analysis

Idle
Автор

Did they change the structure? Why did not you talked about validator, checkpoints? I have stuck here: import great_expectations as gx

context = gx.get_context()

validator =
"data.csv"
)



checkpoint =
name="my_quickstart_checkpoint",
validator=validator,
)

checkpoint_result = checkpoint.run()

ERROR: expectation_suite default not found

Memes_uploader