The BEST library for building Data Pipelines...

Показать описание

Building data pipelines with #python is an important skill for data engineers and data scientists. But what's the best library to use? In this video we look at three options: pandas, polars, and spark (pyspark).

Timeline:
00:00 Data Pipelines
01:11 The Data
02:32 Pandas
04:34 Polars
06:15 PySpark
09:15 Spark SQL

My other videos:

#python #polars #spark #dataengineering

Рекомендации по теме

Комментарии

If you enjoyed this video please consider subscribing and check out some of my videos on similar topics:

robmulla

These are phenomenal, I especially like these short 10-15min videos. Thanks a lot for sharing all these relevant and up to date topics!

anchyzas

One thing you said implicitly is quite important: the footprint of polars is waaayyyy smaller than pandas which feels like polars may be a good choice for edge or serverless computing. In those cases I often refrain from using pandas because of the resources needed and the startup time. I then end up doing funny stuff with dicts, classes, tuples… I‘m considering exploring polars for that.

riessm

Great video! Always curious about Spark and this gave a great overview of these 3 tools! 💡

joseortiz_io

Thanks for such awesome content. I love polars and been trying it since your video came out, it would be nice to see you use it to do a data exploration video :D

tonyle

Another great video! Thanks Rob! Looking forward to the next stream

fee-f-foe-fum

Hey Rob, huge fan of your work, keep rolling😀

shivayshakti

Rob, thank you! It's almost as if you read minds! This video sort of went above-and-beyond here! I'd been toying with trying a local session of Spark, and thanks to you, now have the impetus to give it a go!

DarthJarJar

Great introduction video! Thank you!
Looks like most of time for PySpark was to initiate the session itself, it creates once as far as I understand and the reuses for later GetOrCreate() function calls. But anyway, for bigger pipelines Spark will work faster.

arturabizgeldin

It was a great video and very useful. Adding Spark to the mix was just awesome! For next video, using duckdb and it's benefits vs polars or maybe duckdb alongside polars would be great! Founder of duckdb said that for most companies, it is enough. So testing and discussion on that claim would also be great! Duckdb is said to be using vector search. Discussion on how vector-search is faster or better would also be great. Thanks!

TheSiddhaartha

Hi Rob and thanks for the excellent work, I enjoy each of your videos!
I would be interested in a video explaining how to put several machine learning libraries pulled from GitHub in a row, for example: Object detection + Keypoints estimation + Person identification. Also, how to manage compatible library versions for all these repos that have different (incompatible) requirements.
Thanks!

jorislimonier

I like these type of videos as they clear
all confusion.

prashlovessamosa

Thanks for the educational content Rob

aminehadjmeliani

Thanks for the great video! I'd like to see a comparison with other distributed Python libraries, such as Modin. Thanks!

somerset

I really like your content. Absolutely grade A+

aabbassp

Great video! I have a Junior Data Engineer interview coming up and I'm stressed. I don't have any previous working experience in this field. I feel somewhat confident in SQL and Pandas and have been practicing on Strata Scratch. I absolutely hate the Data Structures and Algorithms type of questions like the ones on leetcode and I can't even answer the easy ones. I'm worried that my interview will have those kinds of coding problems. My initial goal was to become a Data Analyst but decided to apply for Data Engineer since it is a junior position.

chillvibe

excellent. Great contents.
Thanks for sharing..

peterluo

Hey Rob, this was a great video - clear and concise. Could you explain how you would set up an analysis that would run regularly as the data changed? For example, the flight data you used in this example, let's say that was updated once a week and you needed to update the aggregate stats, and maybe even track the aggregates over time. Thanks!

steve_dunlop

very good video. Can you please make more advanced polars videos? I have start switching to polars from pandas and I really want to learn more about how to do more advanced things with them.

Alexander-pktu

Hi Rob, wonderful video as always! Can you make a video on how to deploy a trained machine learning model (maybe the XGBoost forecaster you made) using Docker?

Arkantosi

The BEST library for building Data Pipelines...

The BEST library for building Data Pipelines...

How to build a Library in Africa? - IT IS FINALLY FINISHED!!

A Genius Design for a Library 📚 How Did they Build That? | Smithsonian Channel

building my dream home library (for 1,000+ books)

Tools to build your library and your craft.

Langchain: The BEST Library For Building AI Apps In Python?

Library Building Extended Walkthrough

How to Build a Little Free Library and Bring Happiness and Knowledge to Your Neighborhood

The Best CITY Build Hacks & Ideas in Minecraft!

The best community library building in the country.

What books should you buy?? - How to build a personal theological library (Part 1)

Little Free Library Build | DIY

I’m building my dream custom library from scratch (Story 60)

Build Your Home Library for $600 IKEA Billy Bookshelf Hack

building & organizing my ✨DREAM LIBRARY✨ library tour + showing you every book i own!

How to Build the Best Library with Component Types and Folders

Top 5 : The oldest library in the world #amazing #history #travel #building

How to Build A Theological Library

building my dream reading room/home library🤩📚 bookshelf organization, decorating, & room tour...

Library | Books | Europe 11floors building

How to Build a Little Free Library | Ask This Old House

HOW TO BUILD YOUR OWN CLASSIC LIBRARY - A short lecture.

Best Webflow Component Library to build quicker - Relume vs Grid Up

The NPM Library Speedrun - 90 minutes to build, CI, and publish