Galore: memory efficient trainning

preview_player

Показать описание

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.

1:00 Discussion of LORA
2:10 The Adam in the low rank
5:50 How to get the low rank representation for the gradient
10:22 The Galore algorithm
16:02 The subspace search of galore
18:07 What's the subspace update perdiod
21:57 Comparison betwwen LORA and galore
26:50 Use galore for pretrain
29:04 The theoretic discussion of galore.

Yu-Chung Wang

Рекомендации по теме