Tips and tricks for distributed large model training

preview_player
Показать описание
Discover several different distribution strategies and related concepts for data and model parallel training. Walk through an example of training a 39 billion parameter language model on TPUs, and conclude with the challenges and best practices of orchestrating large scale language model training.

Resource:

Speakers: Nikita Namjoshi, Vaibhav Singh

Watch more:

#GoogleIO
Рекомендации по теме
Комментарии
Автор

Thanks, could you also share any link of data parallelism implementation as well?

roy
Автор

This video especially the second half was a bit difficult to follow given too much assumed knowledge.

wryltxw