Jiaqi Liu - Building a Data Pipeline with Testing in Mind - PyCon 2018

preview_player
Показать описание
Speaker: Jiaqi Liu

It’s one thing to build a robust data pipeline process in python but a whole other challenge to find tooling and build out the framework that allows for testing a data process. In order to truly iterate and develop a codebase, one has to be able to confidently test during the development process and monitor the production system.

In this talk, I hope to address the key components for building out end to end testing for data pipelines by borrowing concepts from how we test python web services. Just like how we want to check for healthy status codes from our API responses, we want to be able to check that a pipeline is working as expected given the correct inputs. We’ll talk about key features that allows a data pipeline to be easily testable and how to identify timeseries metrics that can be used to monitor the health of a data pipeline.

Рекомендации по теме
Комментарии
Автор

I'm starting to learn about data pipelines. This was a great intro.

chenjus
Автор

One of the best talk out there on data pipelines. The example she gave was very solid. I wish she'll go more indepth

dragonfly