Alpa: Automating Inter- and Intra- Operator Parallelism for Distributed Deep Learning

preview_player
Показать описание
SAMPL Talk 2022/03/03

Title: Alpa: Automating Inter- and Intra- Operator Parallelism for Distributed Deep Learning
Presenter: Lianmin Zheng (UC Berkeley)

Abstract: Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations, which does not suffice to scale out complex DL models on distributed compute devices. Alpa distributes the training of large DL models by viewing parallelisms as two hierarchical levels: inter-operator and intra-operator parallelisms. Based on it, Alpa constructs a new hierarchical space for massive model-parallel execution plans. Alpa designs a number of compilation passes to automatically derive the optimal parallel execution plan in each independent parallelism level and implements an efficient runtime to orchestrate the two-level parallel execution on distributed compute devices. Our evaluation shows Alpa generates parallelization plans that match or outperform hand-tuned model-parallel training systems even on models they are designed for. Unlike specialized systems, Alpa also generalizes to models with heterogeneous architectures and models without manually-designed plans.

Bio: Lianmin is a third-year Ph.D. student in the EECS department at UC Berkeley, advised by Ion Stoica and Joseph E. Gonzalez. His research interests lie in the intersection of machine learning and programming systems, especially domain-specific compilers for accelerated and scalable deep learning.

-
SAMPL is an interdisciplinary machine learning research group exploring problems spanning multiple layers of the system stack including deep learning frameworks, specialized hardware for training and inference, new intermediate representations, differentiable programming, and various applications. We are part of the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Our group is a collaboration between researchers from Sampa, Syslab, MODE, and PLSE.

Рекомендации по теме
Комментарии
Автор

That's good presentation time, can you give presentation file to me? or need to presentation file URL ?
Thank you ^^

hyungrak.kim_ko
join shbcf.ru