filmov
tv
Distributed Deep Learning with Horovod on Ray - Travis Addair, Uber

Показать описание
Distributed Deep Learning with Horovod on Ray - Travis Addair, Uber
Horovod is an open source framework created to make distributed training of deep neural networks fast and easy for TensorFlow, PyTorch, and MXNet models. Horovod's API makes it easy to take an existing training script and scale it run on hundreds of GPUs, but provisioning a Horovod job with hundreds of GPUs can often be a challenge for users who lack access to HPC systems preconfigured with tools like MPI. The newly introduced Elastic Horovod API introduces fault tolerance and auto-scaling capabilities, but requires further infrastructure scaffolding to configure. In this talk, you will learn how Horovod on Ray can be used to easily provision large distributed Horovod jobs and take advantage of Ray's auto-scaling and fault tolerance with Elastic Horovod out of the box. With Ray Tune integration, Horovod can further be used to accelerate your time-constrained hyperparameter search jobs. Finally, we'll show you how Ray and Horovod are helping to define the future of machine learning workflows at scale.
Horovod is an open source framework created to make distributed training of deep neural networks fast and easy for TensorFlow, PyTorch, and MXNet models. Horovod's API makes it easy to take an existing training script and scale it run on hundreds of GPUs, but provisioning a Horovod job with hundreds of GPUs can often be a challenge for users who lack access to HPC systems preconfigured with tools like MPI. The newly introduced Elastic Horovod API introduces fault tolerance and auto-scaling capabilities, but requires further infrastructure scaffolding to configure. In this talk, you will learn how Horovod on Ray can be used to easily provision large distributed Horovod jobs and take advantage of Ray's auto-scaling and fault tolerance with Elastic Horovod out of the box. With Ray Tune integration, Horovod can further be used to accelerate your time-constrained hyperparameter search jobs. Finally, we'll show you how Ray and Horovod are helping to define the future of machine learning workflows at scale.
20230329_Webinar: Distributed Deep Learning with Horovod
Distributed Deep Learning with Horovod on Ray - Travis Addair, Uber
Distributed Deep Learning with Horovod and Azure Databricks
Distributed Deep Learning using Tensorflow and Horovod
[Uber Seattle] Horovod: Distributed Deep Learning on Spark
Scale By The Bay 2018: Alex Sergeev, Distributed Deep Learning with Horovod
Distributed Deep Learning using Tensorflow and Horovod
[Uber Open Source ] Distributed Deep Learning with Horovod -- Alex Sergeev
Launch OpenMPI, Horovod and distributed deep learning jobs in a single click
An Uber Journey in Distributed Deep Learning
Horovod - Fast and Easy Distributed Deeplearning | 2019 BDM, WPI
Horovod: Distributed Deep Learning for Reliable MLOps at Uber - Travis Addair, Uber Technologies
A friendly introduction to distributed training (ML Tech Talks)
[Uber Open Summit 2018] Horovod: Distributed Deep Learning in 5 Lines of Python
Deep Learning at Scale with Horovod feat. Travis Addair | Stanford MLSys Seminar Episode 10
NVAITC Webinar: Multi-GPU Training using Horovod
Efficient Data Parallel Distributed Training with Flyte, Spark & Horovod
Distributed Machine Learning with Horovod on VMware vSphere with NVIDIA GPUs and PVRDMA
NEANIAS Core. Distributed Deep Learning by Horovod
17 How to use Keras, BERT, Horovod, Python, PySpark for distributed deep learning for classification
End-to-End Deep Learning with Horovod on Apache Spark
Distributed gradient descent exercise using a Horovod algorithm and PyTorch
PR-129: Horovod: fast and easy distributed deep learning in TensorFlow
Benchmarks + How-Tos of Convolutional Neural Network on HorovodRunner Enabled Apache Spark Clusters
Комментарии