EuroSciPy 2023 - Keynote: Polars

preview_player
Показать описание

Polars is the "relatively" new fast dataframe implementation that redefines what DataFrames are able to do on a single machine, both in regard to performance and dataset size.
In this talk, we will dive into polars and see what makes them so efficient. It will touch on technologies like Arrow, Rust, parallelism, data structures, query optimization and more.
Рекомендации по теме
Комментарии
Автор

I have no knowledge or time to do benchmarking but, I was using pandas' "append" to combine about 8000 CSV files (about 10 GB in total) and it was taking almost an hour and a half, i decided to try polars, according to stack overflow i could use, concat, vstack, or extend, i randomly chose "vstack", and it did the same workload in less than 1 minute, same computer, same python version, same everything, all i had to do was modify the script a little bit, for example remove "index = False" when exporting the resulting (huge) dataframe to CSV.

iutubtivi
Автор

The API is very similar to lpyspark. In fact I don't think it would be a hassle to convert existing pipelines to polars.

Molox