Progressive Distillation for Fast Sampling of Diffusion Models (paper sumary)

Показать описание

PS: For the curious: η is a lowercase 'eta' but that kind of knowledge vanishes as soon as the camera is on me!

DataScienceCastnet

Комментарии

Thanks for the explanation. I have a question, though. After training the student model on two steps of the teacher, why do they make the students the new teacher rather than continuing to train the student model on FOUR steps of the original teacher (and so on). My naïve feeling is that training student 2 on two steps of student 1 and so on could introduce performance loss relative to the teacher.

Adreitz

Thanks for a great paper overview.

Do you know if anyone has applied progressive distillation to stable diffusion?

Is this something Stability would have to do, retraining the entire model?

Or could there be a way to do transfer learning on the existing release stable diffusion model?

haikutechcenter

Hi, thanks for this great video! Do you know where the weight of the loss function (w(lambda)) is derived/explained? I'd like to better understand it, and whether I should use it.

yonistoller

Can you explain ddim paper on someone in cometnts ?

rewixx