Distributed Data Parallel Model Training in PyTorch

preview_player
Показать описание
This tutorial walks through distributed data parallel training in PyTorch via DDP. We will start with a simple non-distributed training job, and end with deploying a training job across several GPUs in a single HAL node. Along the way, you will learn about DDP to accelerate your model training. You will also learn how to monitor GPU status to help profile code performance to fully utilize GPU computing power.

Instructor: Shirui Luo, NCSA Research Scientist
Session Date: February 15, 2023
Рекомендации по теме
join shbcf.ru