Knowledge Distillation | Machine Learning

preview_player
Показать описание
We all know that ensembles outperform individual models. However, the increase in number of models does mean inference (evaluation of new data) is more costly. This is where knowledge distillation comes to the rescue... do watch to find out how!
Рекомендации по теме
Комментарии
Автор

Very clear and concise with proper introduction!

softerseltzer
Автор

Precise and to the point. Thank you for this awesome video.

prasundatta
Автор

I am not sure to understand where the gain in training time is if the student has to learn from the teacher's predictions. Wouldn't it mean that we still have to train the large N x K model?

MeshRoun
Автор

You said in a query that inference time reduces, training time remains same. How is that possible, could you explain?

kavyagupta
Автор

Crisp and Clear! why we have used cross entropy function here?

aashishrana
Автор

Awesome explanation mate, waiting for more videos!!!

wolfisraging
Автор

Thank you! I was recently reading about this topic but was having trouble understanding. Your explanation was fantastic. Is knowledge distillation really just replacing the ground truth "hard" labels in the dataset with the teacher's soft labels?

victorsuciu
Автор

Can you explain it a little more? How to select parameters for student architecture?

UmamahBintKhalid
Автор

Imaging, we have.a three way
what????

xiangyangfrank