Alpa: Automating Inter- and Intra- Operator Parallelism for Distributed Deep Learning

Показать описание

SAMPL Talk 2022/03/03

Title: Alpa: Automating Inter- and Intra- Operator Parallelism for Distributed Deep Learning
Presenter: Lianmin Zheng (UC Berkeley)

Abstract: Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations, which does not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive the optimal parallel execution plan in each independent parallelism level and implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.

Bio: Lianmin is a third-year Ph.D. student in the EECS department at UC Berkeley, advised by Ion Stoica and Joseph E. Gonzalez. His research interests lie in the intersection of machine learning and programming systems, especially domain-specific compilers for accelerated and scalable deep learning.

-
SAMPL is an interdisciplinary machine learning research group exploring problems spanning multiple layers of the system stack including deep learning frameworks, specialized hardware for training and inference, new intermediate representations, differentiable programming, and various applications. We are part of the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Our group is a collaboration between researchers from Sampa, Syslab, MODE, and PLSE.

Рекомендации по теме

Комментарии

That's good presentation time, can you give presentation file to me? or need to presentation file URL ?
Thank you ^^

hyungrak.kim_ko

Alpa: Automating Inter- and Intra- Operator Parallelism for Distributed Deep Learning

OSDI '22 - Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

Alpa: Automating Inter- and Intra- Operator Parallelism for Distributed Deep Learning

Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

RAS: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning - Gustavo Leite

Alpa: Automated Model-Parallel Deep Learning - Zhuohan Li | Stanford MLSys #59

Alpa - Simple large model training and inference on Ray

Alpa A Compiler for Distributed Deep Learning - TVMCon2023

OSDI '23 - AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving

dependence automatic parallelization

USENIX Security '21 - Using Amnesia to Detect Credential Database Breaches

SysML 19: Zhihao Jia, Optimizing DNN Computation with Relaxed Graph Substitutions

Building Brains - Parallel training strategies of large-scale deep learning neural networks

Hybrid Parallelism

SysML 19: Jia Zhihao, Beyond Data and Model Parallelism for Deep Neural Networks

Ray Community and the Ray Ecosystem

Review of Quality Control Considerations for Resting-state fMRI: Dr. Jean Chen

Parallel Training of Deep Networks with Local Updates

FHPNC 2021 - Parallelism-preserving automatic differentiation for second-order array languages

Evaluating large language models with Ray in hybrid cloud

GCA:122 Distributed Deep Learning of ResNet50 and VGG16 with Pipeline Parallelism

Paper #107. Varuna: Scalable, Low-cost Training of Massive Deep Learning Models

High-Performance Communication Strategies in Parallel and Distributed Deep Learning

USENIX ATC '21 - Fine-tuning giant neural networks on commodity hardware with automatic pipelin...

Parallelism and Acceleration for Large Language Models with Bryan Catanzaro - #507