Efficient Distributed Hyperparameter Tuning with Apache Spark

Показать описание

Hyperparameter tuning is a key step in achieving and maintaining optimal performance from Machine Learning (ML) models. Today, there are many open-source frameworks which help automate the process and employ statistical algorithms to efficiently search the parameter space. However, optimizing these parameters over a sufficiently large dataset or search space can be computationally infeasible on a single machine. Apache Spark is a natural candidate to accelerate such workloads, but naive parallelization can actually impede the overall search speed and accuracy.

In this talk, we’ll discuss how to efficiently leverage Spark to distribute our tuning workload and go over some common pitfalls. Specifically, we’ll provide a brief introduction to tuning and motivation for moving to a distributed workflow. Next, we’ll demonstrate best practices when utilizing Spark with Hyperopt – a popular, flexible, open-source tool for hyperparameter tuning. This will include topics such as how to distribute the training data and appropriately size the cluster for the problem at hand. We’ll also touch on the conflicting nature between parallel computation and Sequential Model-Based Optimization methods, such as the Tree-structured Parzen Estimators implemented in Hyperopt. Afterwards, we’ll demonstrate these practices with Hyperopt using the SparkTrials API. Additionally, we’ll showcase joblib-spark, an extension our team recently developed, which uses Spark as a distributed backend for scikit-learn to accelerate tuning and training.

This talk will be generally accessible to those familiar with ML and particularly useful for those looking to scale up their training with Spark.

Connect with us:

Рекомендации по теме

Комментарии

Does hyperopt works fine with LightGBM using SparkTrials()?

sanketsharma

How to do hyperparamter tuning in spark LDA ()

subhamsubimanbiswasrevenue

For Mr Krab's and Krusty Krab' sake, I hope Plankton doesn't get his hands on this video!

isabelomakhanya

Efficient Distributed Hyperparameter Tuning with Apache Spark

Efficient Distributed Hyperparameter Tuning with Apache Spark

Cutting Edge Hyperparameter Tuning Made Simple With Ray Tune - Antoni Baum | PyData Global 2021

Ray Tune: Distributed Hyperparameter Optimization Made Simple - Xiaowei Jiang

Simple Methods for Hyperparameter Tuning

An Introduction to Distributed Hybrid Hyperparameter Optimization- Jun Liu | SciPy 2022

AutoML20: A Modern Guide to Hyperparameter Optimization

Practical approaches for efficient hyperparameter optimization with Oríon | SciPy 2021

Practical Approaches for Efficient Hyperparameter Optimization

Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue

Fugue Tune: Distributed Hybrid Hyperparameter Tuning

Hyperparameter Tuning (7) - Infrastructure and Tooling - Full Stack Deep Learning

Best Practices for Hyperparameter Tuning with MLflowJoseph Bradley Databricks

Autotune: A Derivative-free Optimization Framework for Hyperparameter Tuning

Optuna: A Define-by-Run Hyperparameter Optimization Framework | SciPy Japan | Shotaro Sano, et al

Hyperparameter Tuning Via Apache Spark™ and Ray

Tuning and scaling your ML models

Maggy: Asynchronous distributed hyperparameter optimization based on Apache Spark Asynchronous algo…...

Maximize accuracy of your ML model with advanced hyperparameter tuning strategies - AWS

Optuna: A Define by Run Hyperparameter Optimization Framework | SciPy 2019 |

17 - Hyperparameter Optimization - Ben Albrecht

MedAI #37: Federated Hyperparameters Tuning: Challenges, Baselines & Connections | Mikhail Khoda...

'Prompt Engineer, Hyperparameter Tuning', Most Asked Interview Q&A of HYPERPARAMETER T...

GridSearchCV | Hyperparameter Tuning | Machine Learning with Scikit-Learn Python

Bayesian Optimization for Hyperparameter Tuning with Scott Clark - #50