Python Pandas vs Julia Dataframes

preview_player
Показать описание
A performance comparison between Python and Julia for common operations on structured data.
Рекомендации по теме
Комментарии
Автор

Thank you! This is exactly the type of stuff I was looking for to finally decide to try out Julia.

bryan
Автор

And this is why I've moved to Julia.

InfiniteQuest
Автор

Thank you for sharing. Very interesting.

EdwardDowllar
Автор

That's cool. But tbh there is two things we need to know:
1) For people there is no difference between 1 ms and 200 ms. They both are immediately. It will be significant for extremely big datasets only.
2) Julia have to compile each function that is called first time for this session. So you will have to wait seconds before she start her super fast performanse. For each new call.
So literally, if we restart kernel, and delete all benchmarcs and then try to run all cells, Pandas will complete it far earlier.

TheSkyInFire
Автор

Hi James. Thank you for posting this performance comparison. I found in my machine that the performance of reading csv files is quite similar.

Reading ratings in julia: 5.441 s (329 allocations: 1.79 GiB)
Reading ratings in python: 5533.5729122161865 msecs

These are the system params:
OS system : Linux
OS Name : posix
OS Version : 5.15.0-39-generic

python version : 3.9.12
numpy version : 1.22.2

Julia 1.73

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04 LTS"

Selected Jupyter core packages...
IPython : 8.2.0
ipykernel : 6.9.1
ipywidgets : 7.6.5
jupyter_client : 6.1.12
jupyter_core : 4.9.2
jupyter_server : 1.13.5
jupyterlab : 3.3.2
nbclient : 0.5.13
nbconvert : 6.4.4
nbformat : 5.3.0
notebook : 6.4.8
qtconsole : 5.3.0
traitlets : 5.1.1

juangoog
Автор

Brilliant! Do you plan on doing a Python vs Julia comparison for training machine learning models? That'd be lovely.

lucasfranciscosantos
Автор

I hope Julia replaces python in the future. This is revolutionary

greendsnow
Автор

Great video! I've been a fan of Julia since 2013.

yt
Автор

Why are you dividing dividing the difference by the time it took in pandas ? A more meaningful stat would be to divide panda's time over dataframe's time. That way we can know that dataframe is x times faster than pandas.

chadwinters
Автор

Gracias por el video. Ordenado, claro. Resultados elocuentes. A usar Julia y Daframes.

cescalan
Автор

Very good knowledge sharing, thank you

abcdf
Автор

can you please make a test modin package with pandas to compare them ?

andranikarakelov
Автор

Wow, that's really impressive. Do you suggest any online course to get started with Julia?

giuliko
Автор

Sorry but i dont get the point i mean its clearly noticable that gc takes so much time that pandas is still able to give out the results quicker so for datscience purposes i dont see any reason for switching

thackserver
visit shbcf.ru