filmov
tv
Deep Dive: Model Distillation with DistillKit
Показать описание
In this deep dive video, we zoom in on model distillation, an advanced technique to build high-performance small language models at a reasonable cost. First, we explain what a model distillation is. Then, we introduce two popular strategies for distillation, logits distillation and hidden states distillation. We study in detail how they work and how they're implemented in the Arcee DistillKit open-source library. Finally, we look at two Arcee models built with distillation, Arcee SuperNova 70B and Arcee SuperNova Medius 14B.
Note: my calculation at 18:45 is wrong. It's 2.3 Tera tokens, not 2.3 Peta tokens. Sorry about that 🤡
00:00 Introduction
00:30 What is model distillation?
04:55 Model distillation with DistillKit
11:20 Logits distillation
20:10 Logits distillation with DistillKit
26:10 Hidden states distillation
31:35 Hidden states distillation with DistillKit
36:00 Pros and cons
40:32 Distillation example: Arcee SuperNova 70B
42:50 Distillation example: Arcee SuperNova Medius 14B
44:40 Conclusion
Note: my calculation at 18:45 is wrong. It's 2.3 Tera tokens, not 2.3 Peta tokens. Sorry about that 🤡
00:00 Introduction
00:30 What is model distillation?
04:55 Model distillation with DistillKit
11:20 Logits distillation
20:10 Logits distillation with DistillKit
26:10 Hidden states distillation
31:35 Hidden states distillation with DistillKit
36:00 Pros and cons
40:32 Distillation example: Arcee SuperNova 70B
42:50 Distillation example: Arcee SuperNova Medius 14B
44:40 Conclusion
Deep Dive: Model Distillation with DistillKit
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
Model Distillation: The Alchemical Art of AI
Distilling Knowledge in Neural Networks | Deep Dive for Programmers
SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance
Time for some (extreme) distillation with Thomas van Dongen - founder of the Minish Lab
This GPT-5 NEWS Could Change EVERYTHING...
Motion Consistency Model - NeurIPS 2024
Deep Dive: AI's Race to Superintelligence - Are We Approaching the Singularity?
Knowledge Distillation with TAs
Model Distillation: From Large Models to Efficient Enterprise Solutions
Knowledge Distillation: The story of small language model learning from large teacher models
Shrinking Giants: Unraveling the Magic of AI Model Distillation
Segment Anything Model (SAM) - Foundational Model Deep Dive
Lecture 10 - Knowledge Distillation | MIT 6.S965
Dave's Digital Double -Deep Dive -Context Distillation - Inner Monologue Chain of Thought Promp...
Unveiling the Power of Model Distillation: Make AI Smarter, Faster, and Friendlier
How Large Language Models Work
Lecture 10 - Knowledge Distillation | MIT 6.S965
DEEP DIVE - 📢Podcast #7 - Distillation with Tritordeum 🥃
Distillation of Transformer Models
671 Billion Parameters, One Model: DeepSeek-V3 Deep Dive
Deep dive into the Flux
KNOWLEDGE DISTILLATION ultimate GUIDE
Комментарии