filmov
tv
Galore: memory efficient trainning
Показать описание
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection.
1:00 Discussion of LORA
2:10 The Adam in the low rank
5:50 How to get the low rank representation for the gradient
10:22 The Galore algorithm
16:02 The subspace search of galore
18:07 What's the subspace update perdiod
21:57 Comparison betwwen LORA and galore
26:50 Use galore for pretrain
29:04 The theoretic discussion of galore.
1:00 Discussion of LORA
2:10 The Adam in the low rank
5:50 How to get the low rank representation for the gradient
10:22 The Galore algorithm
16:02 The subspace search of galore
18:07 What's the subspace update perdiod
21:57 Comparison betwwen LORA and galore
26:50 Use galore for pretrain
29:04 The theoretic discussion of galore.