Asynchronous Hyperparameter Optimization with Apache Spark -Jim Dowling & Moritz Meister

Показать описание

For the past two years, the open-source Hopsworks platform has used Spark to distribute hyperparameter optimization tasks for Machine Learning. Hopsworks provides some basic optimizers (gridsearch, randomsearch, differential evolution) to propose combinations of hyperparameters (trials) that are run synchronously in parallel on executors as map functions. However, many such trials perform poorly, and we waste a lot of CPU and hardware accelerator cycles on trials that could be stopped early, freeing up the resources for other trials. In this talk, we present our work on Maggy, an open-source asynchronous hyperparameter optimization framework built on Spark that transparently schedules and manages hyperparameter trials, increasing resource utilization, and massively increasing the number of trials that can be performed in a given period of time on a fixed amount of resources. Maggy is also used to support parallel ablation studies using Spark. We have commercial users evaluating Maggy and we will report on the gains they have seen in reduced time to find good hyperparameters and improved utilization of GPU hardware. Finally, we will perform a live demo on a Jupyter notebook, showing how to integrate maggy in existing PySpark applications.

About: Databricks provides a unified data analytics platform, powered by Apache Spark™, that accelerates innovation by unifying data science, engineering and business.

Connect with us:

Рекомендации по теме

Asynchronous Hyperparameter Optimization with Apache Spark -Jim Dowling & Moritz Meister

Asynchronous Hyperparameter Optimization with Apache Spark -Jim Dowling & Moritz Meister

Maggy: Asynchronous distributed hyperparameter optimization based on Apache Spark Asynchronous algo…...

Maggy Asynchronous distributed hyperparameter optimization based on Apache Spark Asynchronous algo…...

Hyperparameter Optimization Using Dask with ORÍON | Xavier Bouthillier | Dask Summit 2021

Hyperparameter Optimization - when scikit-learn meets PySpark - Sven Hafeneger

Efficient Distributed Hyperparameter Tuning with Apache Spark

Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue

Massively Parallel Hyperparameter Tuning

Asynchronous multi-worker optimization

[AUTOML23] SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization

MedAI #37: Federated Hyperparameters Tuning: Challenges, Baselines & Connections | Mikhail Khoda...

Comparing State of the Art Hyperparameter optimization methods

Practical Approaches for Efficient Hyperparameter Optimization

Richard Liaw: A Guide to Modern Hyperparameters Turning Algorithms | PyData LA 2019

Asynchronous Distributed Optimization via Dual Decomposition & Block Coordinate Subgradient Meth...

Apache Spark and Tensorflow as a Service - Jim Dowling

Tuning ML Models: Scaling, Workflows, and Architecture

Glint: An Asynchronous Parameter Server for Spark (Rolf Jagerman)

Async Actions in Spark | Session-8 | RDDs Concurrency

AutoML with Hyperband

Advanced Hyperparameter Optimization for Deep Learning with MLflow - Maneesh Bhide Databricks

How to Tune and Optimize the Performance of Apache Spark Data Pipelines - Dave Goodhand

Model Parallelism in Spark ML Cross Validation (Nick Pentreath and Bryan Cutler)

Tutorial 6: AutoML with Image data - Hyperparameter Optimization (KDD 2020)