Pandas Dataframes on your GPU w/ CuDF

preview_player
Показать описание
An overview and some quick examples of using CuDF's Pandas accelerator and how much faster it can be than vanilla Pandas for data analysis.

Рекомендации по теме
Комментарии
Автор

Enabling cuDF using a single flag is insane! However, I just wannted to point out (especially for new pandas users) that the proper way to calculate average price per city in pandas is by using groupby. Running in plain pandas is blazing fast (a few ms), nothing compared to 19 minutes. That doesn't mean that cuDF is not useful, but don't forget that using plain pandas properly can get you a long way.

HarrisBallis
Автор

Hey Sentdex, can you take this video down so my manager doesn't find out that I sped up the entire codebase by 200 fold with just one line and I end up getting appreciation bonuses??

Jokes aside, this is absolutely wild. What a gamechanger. Thanks a lot as always, Kevin!

bantaibaman
Автор

Awesome video! I encountered a similar issue where I had to process ~8 GB of data using an AWS Lambda (limited RAM and time). I used polars (pandas alternative written in rust from scratch for performance) and I found it to be blazing fast . It's really really useful - especially with non nvidia devices like my raspberry pi and the AWS lambda function. You should definitely check it out!

harshvaragiya
Автор

Wonderful! Thank you. It would be an interesting comparison with polars library as well.

perryholman
Автор

Once again thank you for sharing :-) You are appreciated.

kenchang
Автор

Missing your tutorials man, trying to install this on windows...

Brickkzz
Автор

For the read_csv operation I would be curious what is actually taking the most amount of time with the Pandas object, I suspect it's building the Python string objects, and if so I wonder if you have PyArrow installed and set = True it would be much faster?

And in general it makes sense that using strings are slow in Pandas, because it's falling back to looking up a Python object by via reference to it, it's actually a much more interesting comparison for number or datetime data types. For strings it would be much more interesting if you had use the PyArrow string data type.

damianshaw
Автор

Thanks for sharing... I would be curious about a comparison between accelerated version of pandas and polars.

__python__
Автор

Outstanding. Thank you for this informatoin.

jameslucas
Автор

Thanks a lot for sharing. super useful <3

usamatahir
Автор

Hello Sentdex, I am reaching out to you regarding your Neural network from scratch series ?
any updates on that, you left on pt 9
Please do continue its an awesome series

and any updated on Book discounts for use for the Black friday ??

please do help

noormohammedshikalgar
Автор

the kubota warrior is back with the heat 🗣🗣🗣

acelaox
Автор

3:38 it doesn't have the prices "in quotes like a string", it's a properly exported csv that has ALL fields quoted. Your pd.read_csv is missing quoting=csv.QUOTE_ALL (or just quoting=1) and optionally quotechar='\"' . The only "magic" pandas is doing is interpreting that column as quoted. If you add those options, I'm guessing cudf will run just as well, since the ingest portion will still be using python standard lib or at least pandas C implementations.

Mil-Keeway
Автор

Great video. Just one thing: instead of comparing cuDF with vanilla Pandas, wouldn’t a comparison with Modin be a more appropriate one?

thetdg
Автор

Please post more often videos Harrison

oguzhanyldrm
Автор

What if my RAM (128GB) is larger than my VRAM (32GB)? Is normal pandas still faster for data that's larger than the VRAM?

maurice
Автор

@sentdex sir please make videos on 3d deep learning, its really exciting to see your work on point cloud

shashisaini
Автор

how about Mojo? Mojo can actually use GPU to accelerate calculation too, currently Mojo support numpy,pandas in cpu. It will be fun to make a comparison with CuDF. Mojo is more like a superset for python.

BohonChina
Автор

What's the reasoning for not using groupby in this demo? Wouldn't that be the more natural and faster pandas method to use - instead of looping over everything.

Feels a little disingenuous to compare poorly optimised pandas code that no one would actually write.

EarlZMoade
Автор

jesus christ my life has totally changed

AlignmentLabAI