Polars vs Pandas | detailed test with explained results

preview_player
Показать описание
Polars is one of the most trending Python framework which hits Pandas in many performance tests. This video presents 8 distinct tests which demonstrates differences between Pandas and Polars in duration in seconds while running specific functions on data.

I tested the following functions by scoring these two frameworks:
- Test 1: read a single CSV file
- Test 2, and 3: select columns from a loaded dataframe (two approaches).
- Test 4: Filtering data in a dataframe.
- Test 5 and 6: Create a new column (two approaches).
- Test 7: Group and aggregate data.
- Test 8: Fill missing data.

I evaluated the competition in two groups:
1. Group where I did not used Lazy evaluation in Polars.
2. Group where I used Lazy evaluation in Polars.

From a high level perspective, Polars represents data in memory with Arrow arrays while Pandas represents data in memory in Numpy arrays. For this reason, Polars suggest Lazy functionality which makes it much faster. I mentioned it multiple times in this video (Polars has Eager and Lazy APIs, while Pandas can suggest Eager only).

The content of the whole experiment is:
0:00 - Intro
1:08 - Introducing experiment Python code
12:05 - Run the experiment
18:04 - Experiment results (summary).
21:22 - Final test results.

Additional material:

#polars #pandas #experiment
Рекомендации по теме
Комментарии
Автор

Thank you for watching this video. I really appreciate your time! If you liked it, please subscribe the channel to get more useful content on data science, Python programming, artificial intelligence, machine learning and related domains!


Enjoy!
Share your experience below this video! Thanks!

DataScienceGarage