filmov
tv
Pandas DataFrame: turbo charge with PySpark on 12 CPU threads on single node

Показать описание
Speed challenge:
input 1.7GB and 5.5GB of data for data science, on a single node machine. Achieve maximum performance on your laptop. How?
Operational performance of working with these Pandas df, compared to a Spark single node, stand-alone implementation with 12 CPU threads.
Beware:
Lazy Evaluation w/ Spark: execution will not start until an action is triggered.
Remember this when executing speed test: trigger an action!
Code to install Spark and transform Pandas df to Spark df.
#code_your_own_AI
#code_in_real_time
#datascience
#computerscience
#spark
#pandasdataframe
#dataframe
#pyspark
#cpu
#databricks
#speedtest
#multicore
input 1.7GB and 5.5GB of data for data science, on a single node machine. Achieve maximum performance on your laptop. How?
Operational performance of working with these Pandas df, compared to a Spark single node, stand-alone implementation with 12 CPU threads.
Beware:
Lazy Evaluation w/ Spark: execution will not start until an action is triggered.
Remember this when executing speed test: trigger an action!
Code to install Spark and transform Pandas df to Spark df.
#code_your_own_AI
#code_in_real_time
#datascience
#computerscience
#spark
#pandasdataframe
#dataframe
#pyspark
#cpu
#databricks
#speedtest
#multicore