Dask DataFrame is fast now - Florian Jetter (Coiled) @ PyData Südwest

preview_player
Показать описание
Live stream of the PyData Südwest meetup

20:15 Talk / Q&A
20:45 Talk Lightning Talks

Ask questions via Slido:

📺 Dask DataFrame is fast now - Florian Jetter (Coiled)

Dask is a library for distributed computing with Python that integrates tightly with pandas. Historically, Dask was the easiest choice to use (it's just pandas) but struggled to achieve robust performance (there were many ways to accidentally perform poorly). The re-implementation of the DataFrame API addresses all of the pain points that users ran into. We will look into how Dask is a lot faster now, how it performs on benchmarks that is struggled with in the past and how it compares to other tools like Spark, DuckDB and Polars.

Florian Jetter is leading the Dask Engineering team at Coiled Computing. He is a long term dask core maintainer and is an expert in distributed cloud computing and data storage

⚡️ Lightning Talks:
1. Tim Berti - A case study of custom kernels
2. Natalia Mokeeva - Find the best strategy to get involved in Open Source
3. Dr. Lisa A. Chalaguine - Legal Argument Mining from Court Decisions - A Fly

A big thank you to our sponsors:
QuantCo for hosting.
Pioneer Hub, for supporting the organization.