Deep Dive: Model Distillation with DistillKit

preview_player
Показать описание
In this deep dive video, we zoom in on model distillation, an advanced technique to build high-performance small language models at a reasonable cost. First, we explain what a model distillation is. Then, we introduce two popular strategies for distillation, logits distillation and hidden states distillation. We study in detail how they work and how they're implemented in the Arcee DistillKit open-source library. Finally, we look at two Arcee models built with distillation, Arcee SuperNova 70B and Arcee SuperNova Medius 14B.

Note: my calculation at 18:45 is wrong. It's 2.3 Tera tokens, not 2.3 Peta tokens. Sorry about that 🤡

00:00 Introduction
00:30 What is model distillation?
04:55 Model distillation with DistillKit
11:20 Logits distillation
20:10 Logits distillation with DistillKit
26:10 Hidden states distillation
31:35 Hidden states distillation with DistillKit
36:00 Pros and cons
40:32 Distillation example: Arcee SuperNova 70B
42:50 Distillation example: Arcee SuperNova Medius 14B
44:40 Conclusion
Рекомендации по теме
Комментарии
Автор

I thought distillation also means: make the smaller model work as good as large model on a *specific task*

zhivebelarus