DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)

Показать описание

#ddpm #diffusionmodels #openai

GANs have dominated the image generation space for the majority of the last decade. This paper shows for the first time, how a non-GAN model, a DDPM, can be improved to overtake GANs at standard evaluation metrics for image generation. The produced samples look amazing and other than GANs, the new model has a formal probabilistic foundation. Is there a future for GANs or are Diffusion Models going to overtake them for good?

OUTLINE:
0:00 - Intro & Overview
4:10 - Denoising Diffusion Probabilistic Models
11:30 - Formal derivation of the training loss
23:00 - Training in practice
27:55 - Learning the covariance
31:25 - Improving the noise schedule
33:35 - Reducing the loss gradient noise
40:35 - Classifier guidance
52:50 - Experimental Results

Abstract:
We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for sample quality using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128×128, 4.59 on ImageNet 256×256, and 7.72 on ImageNet 512×512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.85 on ImageNet 512×512. We release our code at this https URL

Authors: Alex Nichol, Prafulla Dhariwal

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Рекомендации по теме

Комментарии

OUTLINE:
0:00 - Intro & Overview
4:10 - Denoising Diffusion Probabilistic Models
11:30 - Formal derivation of the training loss
23:00 - Training in practice
27:55 - Learning the covariance
31:25 - Improving the noise schedule
33:35 - Reducing the loss gradient noise
40:35 - Classifier guidance
52:50 - Experimental Results

YannicKilcher

My boyfriend wrote these papers. Go Alex Nichol!

SamanthaTries

Summary: self-supervised learning. Given dataset of good images, keep adding Gaussian noise to it to create sequences of increasingly noisy images. Let the network learn to denoise images based on that. Then the network can "denoise" completely Gaussian random pictures into real pictures.

To do: learn some latent space (like VAEGAN does) so that it can smoothly interpolate between generated pictures and create nightmare arts.

CosmiaNebula

That notation \mathcal{N}(x_t;sqrt{1-\beta_t}x_{t-1}, \beta_t \mathbf{I}) sets my teeth on edge. Doing this with P, a general PDF, is fine, but I would always write x_t ~ \mathcal{N}(sqrt{1-\beta_t}x_{t-1}, \beta_t \mathbf{I}), since \mathcal{N} is the Gaussian _distribution_ with a defined parameterization. BTW, the reason for sqrt{1-\beta_t}x_{t-1} is to keep the energy of x_{t-1} approximately the same as the energy for x_t; otherwise, the image would explode to a variance of T*\beta after T iterations. It's probably a good idea to keep the neural network inputs to about the same range every time.

scottmiller

Thanks a lot for the thorough explanation!

It's helping me figure out a topic for my master's degree.

Much much appreciated ^^

ahmedalshenoudy

yannic, thanks for the video. the audio is a little soft even at max volume (unless I'm wearing my headphones). is it possible to make it a bit louder?

linminhtoo

Historic video! Fun to see it now and compare it to the current state of image generation. I’ll check it again in two years to see how far we’ve got.

pedrogorilla

18:46 I guess it’s very likely to be related to Shannon’s Sampling theorem, reconstructing the data distribution by sampling with the well defined normal distribution. The number of time steps and Beta closely related to the band width of the data distribution.

binjianxin

Love it!! It's called the "number line" in english. Keep up the great work

andrewcarr

Can you please make a video about SNN's and latest research on SNN's?

MrBOB-hjjq

There is this step wise generation in GAN's, not based on steps from noise to image, but based on the size of the image, like in Pro-GAN and MSG-GAN. In these models you have discriminators for different sizes of the image, kind of.

proinn

This makes me think that instead of super res from lower res image it could be even more effective to store a sparse pixel array (with high res positioning). You could even have another net 'learn' a way of choosing eg which 1000 pivels of a high res image to store (pixels providing most information for reconstruction).

JamesAwokeKnowing

Great video! I was surprised to see this after the latest paper just a fews days back! Thanks for the great explanations!

sshatabda

Any results(images) from generative models should be accompanied by the nearest neighbor(vgg latent, etc) from the training dataset. I am going to train it on mnist🏋

bgjunge

Just Amazing. I guess I might read this paper for another whole day if I missed your video. Grateful!

impromptu

Another question. If the network is predicting the noise added to a noisy image, what do you then do with that prediction? Subtract it from the noisy image? Do you then run it back through the network to again, predict noise?

When you train this network, do you train it to only predict the small amount of noise added to the image between the forward process steps? Or does it try to predict all the noise added to the image from that point?

Or maybe it's more like the forward process? Starting with latent x_T as input to the network, the network gives you an 'image' that it thinks is on the manifold (x_T-1). At this point, it most likely isn't, but, you can move 1/T towards it like we did moving towards the Gaussian noise to get to x_T. Then, repeat....?

More examples and less math always helps...

easyBob

I would say that the sqrt(1-B) is used to converge to a N(0, sigma), mainly in it's "mu", othersize adding gaussian noise would just (in expectation) have X0 as mu, instead of 0

bertobertoberto

I´ve only listened to 11 minutes so far but DDPMs remind me a lot of Compressed (or Compressive) Sensing ...

stephanebeauregard

This is me being lazy and not looking it up, but if they predict the noise instead of the image, to actually get the image they subtract the predicted noise from the noisy image iteratively until they get a clean image?

CristianGarcia

16:55 denoising depends on the entire data distribution sizes because adding random noise in one step can be done independent of all previous steps; just add a bit of noise wherever you like. But removing noise (the reverse) has to assume there was noise added in some number of previous steps. Thus, in the example of denoising a small child's drawing, it's not that we're removing ALL the noise. Instead, The dependence problem arises in simply taking a single step towards a denoised picture.

Can anyone clarify/confirm?

austin

DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)

DDPM - Diffusion Models Beat GANs on Image Synthesis (Machine Learning Research Paper Explained)

Diffusion Models Beat GANs on Image Synthesis | ML Coding Series | Part 2

ECS 289G Talk: Diffusion Models Beat GANs on Image Synthesis

CS 198-126: Lecture 12 - Diffusion Models

Diffussion Models Beat Gans On Image Synthesis

Diffusion Models Beat GANs on Image Synthesis (211120)

Diffusion Models Beat GANs on Image Synthesis

What are Diffusion Models?

Denoising Diffusion Probabilistic Models | DDPM Explained

DDPM world model with sampling at end

ddpm_mnist_50epoc

Diffusion models from scratch in PyTorch

Harvard Medical AI: Katherine Tian presents an introduction to Diffusion Models

Diffusion Models | Paper Explanation | Math Explained

AI Image Diffusion Explained in 50 Seconds

Why Does Diffusion Work Better than Auto-Regression?

Diffusion models explained in 4-difficulty levels

Initial paper on diffusion models #Shorts

MIG 2022 - Denoising Diffusion Probabilistic Models for Styled Walking Synthesis

Introduction to DDPMs (Denoising Diffusion Probabilistic Models) & Medical Imaging

Diffusion Models for Image Generation and Speech Synthesis

Diffusion Models - Live Coding Tutorial

How diffusion models work - explanation and code!

AT-DDPM: Restoring Faces degraded by Atmospheric Turbulence using Denoising Diffusion Probabilistic