Stable/Latent Diffusion - High-Resolution Image Synthesis with Latent Diffusion Models Explained

preview_player
Показать описание
Рекомендации по теме
Комментарии
Автор

I love how they combined VAE, GANs (adversarial loss) and diffusion models

gonzalorubio
Автор

This is the only detailed, non-hype based walkthrough of how SD works, thanks. Especially for explaining the math.

thecheekychinaman
Автор

These videos are criminally underrated! This one, ViT, Attention and LoRA have helped me so much with my learning! As a Compsci student majoring in AI, going from learning lectures and reading books, to reading, understanding and implementing the actual papers is a big leap, and you've made that leap a lot more simpler and digestible. Thank you so much, please never stop this series!

hiepphamduc
Автор

This was a really good video. It really helped me understand this diffusion concept that I didn't know about. Your videos are underrated, but I have no doubt they will gain traction over time.

acasualviewer
Автор

Congratulations on the video. I've always had doubts about whether Stable Diffusion is the same thing as Latent Diffusion. Now with your explanation I understand that they are the same thing.

claudeclaude
Автор

Thank you so much Gabriel! I wanted to understand the intuition behind Latent Diffusion, and watching your video saved me tons of time from actually reading through the paper.

lzh
Автор

Extremely underrated video. Thanks so much for all the explanations!

aesadugur
Автор

Thank you for the video, it is the first video I found about training AE in LDMS, and I think this part is the hard part to understand the whole model, thanks for your explanation, it is very easy to understand. One thing I would like to add is that the AE in the paper is based on VQ-VAE, so L_rec uses perceptual loss and L_adv is a patch-based adversarial objective. Anyway, I hope you will continue to work on this series!

Anonymou
Автор

Really enjoyed every minute. Got a new subscriber

AI_For_Scientists
Автор

Hello, thanks for the explanations! just a few words on the greek letters. its "psi" not "phi" here, and the "rho"_theta you mention is actually a "tau"

ahamuffin
Автор

thanks for explanation, man! amazing video!

denistimonin
Автор

Fantastic videos! Any plan for the recently published ControlNet?

alexxiang
Автор

Good video, but didn't explain how the cross attention output is actually being used in the UNET

NadavBenedek
Автор

Does the text emb serve as a label in this model? For example, i put a pic of peguin and describe it "peguin". The model learns to match pic and text and reduce loss

hunterli
Автор

Hey, what's the setup you are using to write and see the paper on split-screen?

prateekpani
Автор

Thank you very much for an amazing explanation! 
one question tho, on minute 22:00 when explaining the Autoencoder loss function, you activate the log function on the output of the discriminator, isn't that a bit problematic ( since log is not defined in the case that the discriminator predicts "fake")?

gabrielsamberg
Автор

How do you know the "epsilon" ? Meaning for the second training step, where you are doing MSE ( noise, predicted noise), how do you know the "noise" before hand? Is it coming through a function "P" ? Also what does it mean to train the diffusion layers? Are diffusion layers also like convolution?

tushargarg
Автор

why is it better to predict the noise instead of the denoised image directly in the UNet? Thanks for your videos.

anoubhav
Автор

Really nice Video. I have a doubt about the latent loss, 1) Are they trying to fake the encoder-decoder part and the real input real?? 2) if the above assumption is true then they need to put plus sign for 2nd term in the loss function (i.e output of discriminator with encoder-decoder output as input)

oxxdkvy
Автор

I love your videos, been following you since the girlfriend video, can you please explain RWKV models

namidasora