filmov
tv
Scaling laws for large language models
Показать описание
The lecture presents the idea of scaling laws determining the relationship between model size (number of parameters), training dataset size (number of tokens), and the amount of computing available for training. At the end I am also introducing one of the weirdest phenomena in language model training - grokking