The challenges in Variational Inference (+ visualization)

Показать описание

In this video, we will look at the simple example of the Exponential-Normal Model with a latent and an observed variable. Even in this simple example with one-dimensional random variables, the marginal and therefore also the posterior is intractable, which motivates the usage of Variational Inference.

We are going to compare the probability distributions we have access to: Prior, Likelihood and Joint, as well as the ones we do not have access to (due to intractable integrals): Marginal and Posterior. This should show that latent does not necessarily have to mean not computable.

-------

-------

Timestamps:
00:00 Recap VI and ELBO
00:30 Agenda
00:52 Example: Exponential-Normal model
02:26 (1) We know the prior
04:15 (2) We know the likelihood
05:36 (3) We know the joint
06:34 (1) We do NOT know the marginal
08:15 (2) We do NOT know the (true) posterior
08:53 Why we want the posterior
09:51 Remedy: The surrogate posterior
10:31 Example for the ELBO
10:58 Fix the joint to the data
11:37 Being able to query the joint
12:56 Visualization
14:52 Outro

Рекомендации по теме

Комментарии

Just Discovered your Channel and ive got to say that i am really impressed by the amount of work you put into it. Looking forward to seeing Other great vidéos like this! (Until then I have a lot to catch up)

Louis-mlzr

I’ve watched various university course lecture, some papers, some blog posts for several days, still can’t understand “what we have”, “what we want to find”, etc..

You explain them concretely, even show an simple example that p(x) is intractable. Instantly make me understand, much appreciated! 🎉

kimyongtan

Just fond out this channel & and I would like to thank you for your thoughtful work

hugogabrielidis

Very helpful video, thank you so much 😊

thebluedragon

amazing video! Great help, thank you for your effort to make such an excellent video !!!

clairedaddio

Very well explained! earned my sub. Looking forward for more videos!

AnasAhmedAbdouAWADALLA

I am still struglling with the concept that in the beginning we already somewhat have the joint P(Z, D) that we can evaluate for values of Z and D and get probabilities, but we do not yet have the conditional P(Z|D). The joint itself already encodes the relationship between Z and D, P(Z, D), no? Why do we want the conditional that should effectively encode the same thing. (Perhaps I ll rewatch the part "Why we want the posterior") again.

matej

really nice, could you also make some advance thing about sparse gaussian process?

jiahao

Great video, I really liked the concrete example and the actual computation of integral approximations etc. I also really like the amount that you distinguish between what we know and what we don't know when defining the different distributions (i.e. *assuming we have a z*, we can plug in and get p(x, z)).

On that note, towards the end you talked about p(z, x=D), i.e. the joint over z and x where you've plugged in the observed dataset for x. You showed that this is actually not a valid probability distribution. Can you explain a bit more about why exactly that is the case? Why can't we simply treat the joint p(z, x=D) as the conditional. We are plugging in known data, and getting a value representing the probability of the latent.

Thanks as always for amazing content, keep it up! :)

addisonweatherhead

Hello, this might be a trivial doubt. But at 12:01 you estimate the P(Z, X=D) for one observed datapoint. What if we have more than one datapoint ? How will this equation be generalised ? Thanks a ton for the video !!

ashitabhmisra

Thanks for a great video. You mentioned that in order to make the connection between x and z in the likelihood function, p(x|z), we make z to be the mean of the Gaussian. As you know, in a Gaussian we have the term (x-z)**2. Now, x and z can have very different dimensions! In that case, how on earth can we take theri difference, let alone compute the p(x|z)? Thanks

MLDawn

Great video！ I have a question, why don't we just model p(x) with some known distribution like Gaussian distribution? Why do we have to compute the integral of p(x, z) w.r.t. z?

yccui

Thanks for the video. However, I would like to ask: in the visualization, you computed the integral over Z, is this the marginal P(X)? But as you've said earlier in the video (7:58), it is intractable to compute. Is there something i'm missing?

sfdv

Really good video as always. But just make sure I understand the variational inference example. Say we are doing dog and cat image classification task. And in the dataset there are 40 percent dog images and 60 percent cat images. Z is latent variable and X is the image. For the prior, P(Z = dog) = 0.4 and P(Z = cat) = 0.6? The P(X|Z) is the likehood of the data, we won't know the actual probability, but we can approximate and train an approximater by using negative log likehood or some types of likehood function? And for the variational inference we just want to know the P(Z|X)? Is my example and understanding correct? Thanks

junhanouyang

why do not know the exact value of posterior, when we know the posteiror is propotional to the joint distribution which can be calculated ? Could you give a practical example to show that knowing the exact value of posterior is needed for an application ? Thank you.

jason

Thanks for your amazing videos about variational inference, it's extremely helpful! I have a question regarding the joint distribution p(z, x). It seems intuitive that we assume latent variable p(z) is a given distribution like a normal distribution, but what if we don't know the likelihood p(x|z)? Is it still possible to do variational inference and how should I understand this in the example of images and the camera settings? Thanks! 😊

tony

Maybe, this is a stupid question, but is it true that intractability of the marginal is valid only for continuous distributions of Z? For discrete distributions we can always calculate summation for all values of z in P(x, z) to get P(x). This implies that variational inference is applicable only for continuous distributions?

ritupande

I'm not sure that claim that we can plug in some value into continious likelihood and get probability value is correct. The probability of this value should be zero, because the measure of some value is zero. Plus p(x) can be greater than 1 and it's strange to have probability of something greater than one. Only integral of p(x) over domain has to be one. Or I missed something?

alexanderkhokhlov

bro explains even the reason why he put a sad face.

hasankaynak

What I do not understand: when looking at the ELBO, we still compare the surrogate function q(Z) - that is a valid probability - with the unnormalized probability p(Z, X=D), right? But why does this comparison even make sense? For me, it seems like VI is some magic to compare the surrogate q(Z) to an unnormalized probability instead to the (unavailable) normalized conditional. Is this actually the gist of it?

besarpria

The challenges in Variational Inference (+ visualization)

The challenges in Variational Inference (+ visualization)

Advanced Model Selection using Bayesian Inference Algorithms - DARE Symposium 2023

Bayesian Computation - Why/when Variational Bayes, not MCMC or SMC?

Variational Inference (VI) - 1.1 - Intro - Intuition

Variational Inference | Evidence Lower Bound (ELBO) | Intuition & Visualization

Preconditioned training of normalizing flows for variational inference in inverse problems

Variational Inference by Automatic Differentiation in TensorFlow Probability

Advanced Probabilistic Machine Learning -- Variational Inference

Avoiding catastrophic risks from uncontrolled AI agency with the Scientist AI

Bob Carpenter - Pathfinder: Quasi-Newton Variational Inference

'Variational Inference 1' by Andrés R. Masegosa, Helge Langseth & Thomas D. Nielsen

Dr. Peng Chen | Projected variational inference for high-dimensional Bayesian inverse problems

Peadar Coyle: Variational Inference and Python

VI - 9.1 - SVI - Stochastic Variational Inference - Review

Demystifying Variational Inference (Sayam Kumar)

Variational inference I

Variational Inference with Implicit Distributions

Variational Inference in Mixed Probabilistic Submodular Models

Deep Learning Lecture 11.2 - Variational Inference

Bayesian Inference Explained: From Bayes' Theorem to MCMC & Real-World Challenges

Peng Chen - Projected Variational Methods for High-dimensional Bayesian Inference

Free-form variational inference I

AI4OPT Seminar Series: Improving Variational Inference for Complex Probabilistic Modeling

A Generalization Bound for Online Variational Inference