The Images in-between | Before Diffusion: Variational Autoencoder VAE explained w/ KL Divergence

preview_player
Показать описание

On the road to Diffusion theory for text-to-image and text-to-video. Generative AI.

The answers to all your questions:
1. enforcing a probabilistic prior on the latent space
2. restrict our posterior approximation to Gaussian distribution.
3. KL-Div as a regularization of the posterior
4. keep thee latent distribution compact around a subspace.
5. smooth transitions of data points in latent space.
6. how can a generative system create such unseen images?

Further reading

Outlook to Vector-Quantized Variational Autoencoder for text-to-image AI tools.

#ai
#datascience
#stablediffusion
Рекомендации по теме
Комментарии
Автор

Great video explanation, I think I get the basic ideas of diffusion systems now, but am left with a strong sense, based on the terminology chosen, that researchers had a reasonable theory about why these would work for translation of text between languages... but then... seem to have stumbled across their utility in all these other domains, and in some ways, based on the papers and explanations, seem to be hand-waving, it feels like the intellectual frameworks are weak and orthogonal to the actual systems being build like SD, Dall-e etc. It really feels like there's a huge semantic theory hole. I wonder if hundreds of thousands of researchers are just throwing millions of variations of diffuser style model architectures at the wall, and some work... and they build from there... and survival of the fittest...

googleyoutubechannel
Автор

Awesome vid!! Do you have any paid courses on the fundamentals of transformers and the various types of NNs? Your teaching style is very compatible with my learning style, so I’d buy any courses you’ve created in a heartbeat!

yzz