Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

preview_player
Показать описание
A complete tutorial on how to train a model on multiple GPUs or multiple servers.
I first describe the difference between Data Parallelism and Model Parallelism. Later, I explain the concept of gradient accumulation (including all the maths behind it). Then, we get to the practical tutorial: first we create a cluster on Paperspace with two servers (each having two GPUs) and then training a model in a distributed manner on the cluster.
We will explore collective communication primitives: Broadcast, Reduce and All-Reduce and the algorithm behind them.
I also provide a template on how to integrate DistributedDataParallel in your existing training loop.
In the last part of the video we review advanced topics, like bucketing and computation-communication overlap during backpropagation.

Chapters
00:00:00 - Introduction
00:02:43 - What is distributed training?
00:04:44 - Data Parallelism vs Model Parallelism
00:06:25 - Gradient accumulation
00:19:38 - Distributed Data Parallel
00:26:24 - Collective Communication Primitives
00:28:39 - Broadcast operator
00:30:28 - Reduce operator
00:32:39 - All-Reduce
00:33:20 - Failover
00:36:14 - Creating the cluster (Paperspace)
00:49:00 - Distributed Training with TorchRun
00:54:57 - LOCAL RANK vs GLOBAL RANK
00:56:05 - Code walkthrough
01:06:47 - No_Sync context
01:08:48 - Computation-Communication overlap
01:10:50 - Bucketing
01:12:11 - Conclusion
Рекомендации по теме
Комментарии
Автор

This is the best video about Torch distributed I have ever seen. Thanks for making this video!

thinhon
Автор

I really love your vidoes. you have a natural talent on simplifying logic and code. in same capacity as Andrej

abdallahbashir
Автор

This is second video Ive watched from this channel after "quantization". And frankly wanted to express my gratitude towards your work as it is very easy to follow and the level of abstractions is tenable to understand concepts holistically.

КириллКлимушин
Автор

Great video, thanks for creating this. I have use DDP quite a lot but seeing the visualizations for communication overlap helped me build a very good mental model.
Would love to see more content around distributed training - Deepspeed ZeRO, Megatron DP + TP + PP

chiragjn
Автор

Starting to watch my 3rd video on this channel, after transformer from scratch and quantization. Thank you for the great content and also for the code and notes to look back again. Thank you.

amishasomaiya
Автор

That's an amazing resource! It's great to see you sharing such detailed information on a complex topic. Your effort to explain everything clearly will really help others understand and apply these concepts. Keep up the great work!

rachadlakis
Автор

Thank you for the tutorial. It is really helpful to learn beyond pytorch documentations.

normxu
Автор

Great introduction. Love the pace of the class and the balance of breadth vs depth

jiankunli
Автор

Super high quality lecture. You have a gift of teaching, man. Thank you!

karanacharya
Автор

Dang. Never thought learning DDP would be this easy. Another great content from Umar. Looking forward for FSDP

tharunbhaskar
Автор

Amazing video. Ideal video of how a lecture on a video should be

pulkitnijhawan
Автор

absolutely amazing! You made these concepts so accessible!

thuanncats
Автор

Incredible content, Umar! Well done! 🎉

Maximos
Автор

Umar hits the sweet spot (Goldilocks zone) by balancing theory and practical😄😄😄😄😄

nithinma
Автор

Amazing content! Thanks for your sharing

cken
Автор

It's very amazing. Thank you sir.

tribunetech
Автор

The video was very interesting and useful. Please make a similar video on DeepSpeed functionality. And in general, how to train large models (for example LLaMa SFT) on distributed systems (Multi-Server) when GPUs are located on different PCs.

МихаилЮрков-тэ
Автор

You deserve many more likes and subscribers!

nova
Автор

Thankyou so much for this amazing video. It is really informative.

prajolshrestha
Автор

Thank you very much for your wonderful video. Can you teach a video on how to use the accelerate library with dpp?

huu-lc