Thanks for the explanation. I have a question, though. After training the student model on two steps of the teacher, why do they make the students the new teacher rather than continuing to train the student model on FOUR steps of the original teacher (and so on). My naïve feeling is that training student 2 on two steps of student 1 and so on could introduce performance loss relative to the teacher.
Adreitz
Thanks for a great paper overview.
Do you know if anyone has applied progressive distillation to stable diffusion?
Is this something Stability would have to do, retraining the entire model?
Or could there be a way to do transfer learning on the existing release stable diffusion model?
haikutechcenter
Hi, thanks for this great video! Do you know where the weight of the loss function (w(lambda)) is derived/explained? I'd like to better understand it, and whether I should use it.
yonistoller
Can you explain ddim paper on someone in cometnts ?