WES MCKINNEY + DAVID PALAITIS - Streaming Featurization with Ibis, Substrait and Apache Arrow

preview_player
Показать описание

Wes McKinney is an open-source software developer focusing on analytical computing. He is the author of Python for Data Analysis, creator of the Python pandas project, and co-creator of Apache Arrow.
David Palaitis is an engineering manager with Two Sigam Investment. His work sees him work on compute & analysis platform that powers the firm's research and investment strategies across the world’s financial markets.

In this talk, you'll learn how Two Sigma and Voltron Data are collaborating to improve the performance of featurization workflows using the Ibis, Substrait, Arrow software stack. Together, the three provide a complete solution for real-time streaming data processing, with Ibis providing high-level APIs for data processing and analysis, Substrait providing a framework for machine learning, and Apache Arrow providing a high-performance data representation for both. This stack makes it possible to process large amounts of streaming data in real time, providing fast and accurate insights for decision-making.

Key Topics:
0:00 - Introductions
0:30 - How I Met Wes McKinney
1:15 - Timeline of Open Source Data Science at TS
7:06 - Featurization Challenges
9:40 - About Wes McKinney
11:48 - Apache Arrow
19:05 - Ibis
22:09 - Substrait
23:25 - One Data Science Interface; Many Data Engines
25:15 - Look Ahead

Related Blog:

Follow ODSC on:
Рекомендации по теме