Polars: Working with Data Larger than RAM memory

preview_player
Показать описание

---------------------------------------------------------------------------------------------------

This video is the fifth of a tutorial series on polars. I explain how to work with larger than RAM data using a dataset that would take 80GB of space in CSV format.

Polars is a FAST DataFrame library in Python that is gaining a lot of attention recently and might replace Pandas entirely.

I hope you enjoy this series! Please subscribe and like the video to support the channel

Timeline:
0:00 Intro
0:39 Reading a sample file
1:40 Aggregate data on the sample
2:42 Lazy Mode
3:23 Aggregating out of RAM data
4:30 Data Visualization
Рекомендации по теме
Комментарии
Автор

Fantastic. Thank you for showing how to use Polars streaming capabilities.

RobbDunlap
Автор

Many thx for sharing, you have got a new star on your github polars-tutorial

boristherin
Автор

You are a genius! Fantastic video! Thanks!

multitaskprueba
Автор

I wonder how would it handle removing duplicates? If only a chunk of data is being read at a time, you can remove dupes in that chunk, but how about dupes across chunks?

rokaskarabevicius
Автор

Wow this is excellent content. Could you explain more about how internally aggregration works? I am following DASK and POLARS closely and apparently, when u are performing an operation say sum of integer an column in DASK, it calculates sum of that column in each parquet file (partitions in DASK ) and finally returns the sum of all the partition sums. Can you explain how it works in the case of Polars for same agg operation?

chaitanyamadduri
Автор

I would like to note, that polars streaming works only if source parquets are local. If the same files are located on a remote storage, like S3, minio or GCS - it does not work.

elephantum
Автор

It's a shame polars does not support streaming from a SQL db yet, as far as I can tell. Have to use dask

joschomo
Автор

Does this work to stream csv files instead of parquet?😊

murphygreen