How to Use Pandas With Pandera to Validate Your Data in Python

preview_player
Показать описание
Type hints and annotations are not enough when you are using pandas for data analysis in Python. You need validation! Today I’ll show you how to work with Pandera to quickly and easily validate your dataframes.

🎓 Courses:

👀 Code reviewers:
- Yoriz
- Ryan Laursen
- Dale Hagglund

🔖 Chapters:
0:00 Intro
0:47 Type annotations with pandas
3:11 Pandera validation
4:23 Pandera dtypes
4:43 Pandera integration
5:00 Code examples
10:48 Outro

#arjancodes #softwaredesign #python

DISCLAIMER - The links in this description might be affiliate links. If you purchase a product or service through one of those links, I may receive a small commission. There is no additional charge to you. Thanks for supporting my channel so I can continue to provide you with free content each week!
Рекомендации по теме
Комментарии
Автор

Great video Arjan ! It would be great to see the integration with SQL Model since often you want to save the data to a DB without repetition of the schemas.
Thank you for the content !

jorgesilva
Автор

This is the only channel where I use the super thanks. Your channel is amazing and help me grow as a Python developer. Thanks!

brunosompreee
Автор

Great Tutorial. Clean presentation and motivation for use. Pandera was in my toolbox to use in a Pandas Project. I'll follow-up with this clean setup using pydantic. I'll be interested in the integration with FastAPI.

Thank You!

ronaldokun
Автор

Series can be not only a row but also a column of DataFrame.

RatafakRatafak
Автор

I always envied C#-s FluentValidation package. It made validating data objects so easy and readbale. Glad to see Python has something similar with Pandera!

CaptainCsaba
Автор

Just discovered this library last week. Amazing. Thank you

davidl
Автор

Love your videos, always simple shot and to the point

xlrx
Автор

love this would like more example with integration with hypothesis too!

AbdolaMike
Автор

Sounds very useful. Thanks for sharing.

fzfgru
Автор

wonderful series! Add with fastapi is a good shout. Or perhaps ORM into some SQL database? not sure if that makes sense. In any case - VALIDATED 🔥

hudabdulwahab
Автор

Great Video ! :)
I would love a tutorial about the Pint package for working with physical/scientific units including a take from you regarding typehinting and validation of correct function inputs.

silaseul
Автор

Glad to see an integration of pydantic with this --schema file was not practical for a new developer coming into the codebase. Downside is we rely on two libraries but I believe it's worth it for now

ThuBomb
Автор

Thanks for sharing! I would like to know how to integrate with FastAPI. 😄

luizmatias
Автор

Thanks, it was indeed useful for me. I did not know about pandera

edward
Автор

Well, hmmm, interesting)
Would be great to see more on integrations

dmitrykuleshov
Автор

As always !!! your video is interesting and helpful !!! I really want to deep dive into the integration with FastAPI!

MinhVu-ymtk
Автор

Hello Arjan,

Thank you for this great overview. I have a couple of follow-up question.

What kind of validation does `pandera` support?
Can I have
1) fuzzy checks, something like I expect the value to be not a NULL, but I accept a few of them.
2) multicolumn checks? If df["column_a"] == xx then df["column_b"] must be int, otherwise float?
3) expectation regarding the shape of the data, using Z-test to compare it with a given distribution?

Otherwise, this library is pretty useless. I can implement similar check in a few minutes on my own ;)

Dendus
Автор

I changed quite a bit in my programming techniques since I started watching your series of videos. Amongst other, I now also add the typehints when I define new methods.
I am also a user of pandas. But when I see the typehints that you propose for pandas dataframes, I am getting the feeling that this is a bit over-the-top for me. I can understand it may be valuable in a professional software development department. But as an amateur programmer this is a bit too much I think.
I also think you may change your title to also include "Pydantic". This since in the end, you propose to use Pydantic instead of (or combined with) Pandera.

ErikS-
Автор

you can have 500 columns in a dataframe. ok, you will write a lot.


The infered schema might help

alexandrodisla
Автор

As always a great video. Thanks a lot :)

colonellucasl