DuckDB vs Pandas vs Polars For Python devs

preview_player
Показать описание
In this video, @mehdio will do a walkthrough of DuckDB, Polars and Pandas. We will discuss the main features and dive into a pragmatic code example.

📓 Resources

➡️ Follow Us

0:00 Intro
0:34 What is DuckDB
2:46 What is Pandas
3:45 What is Polars
5:12 Code project
6:14 Install & dependencies
7:18 Versatility
8:18 Syntax
9:26 Performance
10:43 Takeaways

#duckdbvspandas #duckdbvspolars #dataengineering #polarsvsduckdb #polarsvspandas #pandasvsduckdb #pandasvspolars
Рекомендации по теме
Комментарии
Автор

DuckDB is the most underused and underrated Python library. I started using it a couple weeks ago and I'm blown away by the efficiency increase over Pandas. Plus SQL is easier and it forces you to think I'm vectorized operations rather than being tempted by Pandas built in loop methods that are super slow

Shawn-crep
Автор

I appreciate the nods to the R community going on in here. Great video!

porlando
Автор

I’ve heard a lot about Duck 🦆 DB and must use it some day 😂

rembautimes
Автор

Well I had just started to learn Polars, but your video and another one comparing DuckDB and Polars are making me doubt my choice… DuckDB seems MUCH faster. Besides, SQL knowledge can be leveraged for everything. Why one would use pandas or polars over DuckDB? Am I missing something?

MrRubix
Автор

How about DUCKDB and SQLALCHEMY? Do they shake hands? Can I do ORM like this?

Emotekofficial
Автор

How dare you use the VOICE on me! nice video tho

temetnosce
Автор

Is DuckDb a query language, a real db like sqlite or both?

fv
Автор

Thank you, for this valuable content!!.
Can you also explain the parquet dataset?
I used to create partitioned Parquet datasets by using Pandas and Polars.

But I want to know how to read data from such partitioned parquet datasets directly to Polars lazy frame format (not to pandas as data size is larger than memory) to do some analytics.

import polars as pl
import pyarrow.parquet as pq

# Read data written to parquet dataset
pq_df = pq.read_table(r"C:\Users\test_pl",
schema=pd_df_schema,
)

pl_df =

Is there any better way to do this

kpyoutuber
Автор

I guess I'm stating the obvious but for anyone who doesn't use SQL for data operations DuckDB is second class. And I surely do not like to use SQL for transformations and such.

allthingsdata
visit shbcf.ru