Training on multiple GPUs and multi-node training with PyTorch DistributedDataParallel

preview_player
Показать описание
In this video we'll cover how multi-GPU and multi-node training works in general.

We'll also show how to do this using PyTorch DistributedDataParallel and how PyTorch Lightning automates this for you.

SUBSCRIBE!
Рекомендации по теме
Комментарии
Автор

i discovered you guys .accelerate nah, lightning deepspeed trainer woooo,

dashsights
Автор

Just out of curiosity, in your great tutorials you mention to start the training with a bash SLURM script for a multi node gpu training, so how can I train (multi node) without SLURM? I mean in the Trainer class I can not set the IP address for the worker and master, so this Library as stand alone without SLURM is capable of running in a multi node gpu environment? Kind Regards

israelpradof
Автор

If I have only 1 machine with 2 GPUs which one do you recommend to use, DDP or DP?

edgarcin
Автор

Notebook link is dead: "Notebook not found"

scotth.hawley
Автор

@ 2:45 was it meant to be written 'ddp_spawn' or is it 'ddp_spwan'?

peterklemenc
Автор

why are you blinking like that are you ok

kevinsasso
Автор

What about learning rate? If i use 1 vs 64 gpus how should I change it?

rahuldeora
Автор

Typo at 1:10 in the video :) num_noes :)

andreipokrovsky
Автор

Does this Library needs under the hood a SLURM cluster?

israelpradof