Scaling laws for large language models

preview_player

Показать описание

The lecture presents the idea of scaling laws determining the relationship between model size (number of parameters), training dataset size (number of tokens), and the amount of computing available for training. At the end I am also introducing one of the weirdest phenomena in language model training - grokking

Mikołaj Morzy

Рекомендации по теме