Ali Zaidi - 10 things I learned about writing data pipelines in Python and Spark.

preview_player
Показать описание
PyData London 2016

Starting in the Q4, 2015, I wrote the financials data pipeline that collates ~200 data points and calculates ~300 metrics for ~80M account filings from ~11M private companies. In this talk, I would share what I learned.

I am a Data Engineer at Duedil - a fintech enabling access to public data about private companies.

Starting in the Q4, 2015, I wrote the financials data pipeline that collates ~200 data points and calculates ~300 metrics for ~80M account filings from ~11M private companies.

I used Python, Spark and loads of good fortune to make this. I would like to share my journey with the PyData community - purely to give something back, as I have learned so much out of the meetups.

My talk would include takeaways, patterns, anti-patterns, mistakes and big mistakes that I made and learned from. I think this will be very useful for beginner-intermediate data wranglers.

00:10 Help us add time stamps or captions to this video! See the description for details.

Рекомендации по теме
Комментарии
Автор

Good Talk, but nothing about Spark its just about plain python. Misleading title.

spatshello