Pandas DataFrame: turbo charge with PySpark on 12 CPU threads on single node

preview_player
Показать описание
Speed challenge:
input 1.7GB and 5.5GB of data for data science, on a single node machine. Achieve maximum performance on your laptop. How?

Operational performance of working with these Pandas df, compared to a Spark single node, stand-alone implementation with 12 CPU threads.

Beware:
Lazy Evaluation w/ Spark: execution will not start until an action is triggered.
Remember this when executing speed test: trigger an action!

Code to install Spark and transform Pandas df to Spark df.

#code_your_own_AI
#code_in_real_time
#datascience
#computerscience
#spark
#pandasdataframe
#dataframe
#pyspark
#cpu
#databricks
#speedtest
#multicore
Рекомендации по теме