filmov
tv
Koalas dataframe on SPARK = Pandas API supercharged!
Показать описание
Ever wondered why your CPU computes Python not on all cores/threads?
Solve this problem for your data engineering and data science tasks, including AI.
Achieve python code execution at 100% on all your CPU threads - with Koalas.
Koalas: a pandas equivalent API that works on Apache Spark.
Transfer your "single-node" & "in-memory" pandas API to a SPARK execution engine (distributed), with additional SQL and ML capabilities. Without mastering pySPARK.
By the way: pySPARK is the next logical step after Koalas - to improve your coding skills.
Minimize execution time (speed improvements) for your data analysis tools.
Plus code for a simple pyArrow SPARK optimizations to convert pandas df even faster.
Bench-marked w/ my Win10 PC on a single AMD CPU and no Nvidia GPU (no cuda cores).
BUT:
Does it really make sense to implement Koalas on a single-node PC, running SPARK? Officially No.
Recommended is, of course, a decent SPARK cluster (Databricks cluster running Databricks Runtime). Check out selected clouds like AWS, Microsoft Azure or Google Cloud!
Personal note:
For 10-100 GB file sizes: performance of my specific applications increase significantly with Koalas.
Increase your python performance (especially speed) with Koalas on SPARK.
#code_in_real_time
#real_time_coding
#JupyterLab
#Python
#SPARK
#Koalas
#dataframe
#pandas
#PySpark
Solve this problem for your data engineering and data science tasks, including AI.
Achieve python code execution at 100% on all your CPU threads - with Koalas.
Koalas: a pandas equivalent API that works on Apache Spark.
Transfer your "single-node" & "in-memory" pandas API to a SPARK execution engine (distributed), with additional SQL and ML capabilities. Without mastering pySPARK.
By the way: pySPARK is the next logical step after Koalas - to improve your coding skills.
Minimize execution time (speed improvements) for your data analysis tools.
Plus code for a simple pyArrow SPARK optimizations to convert pandas df even faster.
Bench-marked w/ my Win10 PC on a single AMD CPU and no Nvidia GPU (no cuda cores).
BUT:
Does it really make sense to implement Koalas on a single-node PC, running SPARK? Officially No.
Recommended is, of course, a decent SPARK cluster (Databricks cluster running Databricks Runtime). Check out selected clouds like AWS, Microsoft Azure or Google Cloud!
Personal note:
For 10-100 GB file sizes: performance of my specific applications increase significantly with Koalas.
Increase your python performance (especially speed) with Koalas on SPARK.
#code_in_real_time
#real_time_coding
#JupyterLab
#Python
#SPARK
#Koalas
#dataframe
#pandas
#PySpark