How I trained a simple Text to Image Diffusion Model on my laptop from scratch!

preview_player
Показать описание
In just 15 points, we talk about everything you need to know about Generative AI Diffusion models - from the basics to Latent Diffusion Models (LDMs) and Text-to-Image conditional Latent diffusion models. I also train a diffusion model with Pytorch on my laptop to demonstrate how it all works.

To access the full code repo && 15 minute code walkthrough video && 4000+ word script && 15+ animations && powerpoint slides used in this video (as well as others on my channel), please consider supporting us on Patreon or YT! It helps the channel massively, so thanks for considering.

#diffusion #ai #machinelearning #generativeai

Related videos:

Papers:

Timestamps:
0:00 - Intro
1:40 - 1
2:43 - 2
3:24 - 3
5:59 - 4
8:09 - 5
9:49 - 6
11:07 - 7
11:55 - 8
14:11 - 9
16:15 - 10
18:49- 11
19:48 - 12
21:03 - 13
22:07 - 14
23:27 - 15
Рекомендации по теме
Комментарии
Автор

As a ComfyUI user (a tool working with Stable Diffusion), I've always been curious about how the Stable Diffusion Model creates images. After reading many articles and watching countless YouTube videos that were either too academic or too superficial, this is the only video that really satisfied my curiosity. Thank you so much for making such a valuable video. Wishing your channel continued growth and looking forward to more great content like this!

thanhphamduy
Автор

You brought some serious points to light that I couldn’t previously see, thank you!

csmith
Автор

great content with so many deep concepts..

mayank_
Автор

Awesome video Mate❤
Looking forward to the next one.

zendr
Автор

You know sir i found out your channel pure accidentally. But thank god i found it . What ever you are teaching us in absolute gold.
But there is this one thing that i gotta ask. I am really curious to know about your background. You never shared your linkedin profile with us. How come you know your shit so deeply?
Finally sir, you are awesome. Have a nice day.

rishiroy
Автор

I really love your channel, Keep up the good work

jqkclzy
Автор

This is a very informative video, thank you so much! Please talk and explain how to code Rectified Flow Neural Network next 🙏🙏 And how it is different from Stable Diffusion 🤔

hilmiyafia
Автор

ThankYou! for your Great work. How can I avail the Dataset for Conditional Generative model ?

tanvikumari
Автор

Good video! I'm very impressed with your results. But, one thing I'm confused about is, during sampling, there was a +σ_t * z term. I assume z is noise, but what is the sigma term? What defines how much extra noise to add each sampling step?

marinepower
Автор

Great Job! Especially considering that these models are not easy to train.
I also never considered training CelebA with text conditioning, which seemed to produce good results given the training time.
A critique: you made a mistake when describing CLIP and with cross-attention. CLIP uses a transformer image encoder and a transformer text encoder, which are jointly trained - it may be possible to use a frozen VAE for the image encoder, but that would probably constrain the latent space and prevent strong semantic alignment.
For cross-attention, K and V come from CLIP whereas Q comes from the image tokens (you reversed them on your slide). Flipping them would also likely work, but then the cross-attention is modulating existing image features rather than introducing new features based on the conditioning.

hjups
Автор

Great video...
If possible provide code also...

naveengeorge