The challenges in Variational Inference (+ visualization)

preview_player
Показать описание

In this video, we will look at the simple example of the Exponential-Normal Model with a latent and an observed variable. Even in this simple example with one-dimensional random variables, the marginal and therefore also the posterior is intractable, which motivates the usage of Variational Inference.

We are going to compare the probability distributions we have access to: Prior, Likelihood and Joint, as well as the ones we do not have access to (due to intractable integrals): Marginal and Posterior. This should show that latent does not necessarily have to mean not computable.

-------

-------

Timestamps:
00:00 Recap VI and ELBO
00:30 Agenda
00:52 Example: Exponential-Normal model
02:26 (1) We know the prior
04:15 (2) We know the likelihood
05:36 (3) We know the joint
06:34 (1) We do NOT know the marginal
08:15 (2) We do NOT know the (true) posterior
08:53 Why we want the posterior
09:51 Remedy: The surrogate posterior
10:31 Example for the ELBO
10:58 Fix the joint to the data
11:37 Being able to query the joint
12:56 Visualization
14:52 Outro
Рекомендации по теме
Комментарии
Автор

Just Discovered your Channel and ive got to say that i am really impressed by the amount of work you put into it. Looking forward to seeing Other great vidéos like this! (Until then I have a lot to catch up)

Louis-mlzr
Автор

I’ve watched various university course lecture, some papers, some blog posts for several days, still can’t understand “what we have”, “what we want to find”, etc..

You explain them concretely, even show an simple example that p(x) is intractable. Instantly make me understand, much appreciated! 🎉

kimyongtan
Автор

Just fond out this channel & and I would like to thank you for your thoughtful work

hugogabrielidis
Автор

Very helpful video, thank you so much 😊

thebluedragon
Автор

amazing video! Great help, thank you for your effort to make such an excellent video !!!

clairedaddio
Автор

Very well explained! earned my sub. Looking forward for more videos!

AnasAhmedAbdouAWADALLA
Автор

I am still struglling with the concept that in the beginning we already somewhat have the joint P(Z, D) that we can evaluate for values of Z and D and get probabilities, but we do not yet have the conditional P(Z|D). The joint itself already encodes the relationship between Z and D, P(Z, D), no? Why do we want the conditional that should effectively encode the same thing. (Perhaps I ll rewatch the part "Why we want the posterior") again.

matej
Автор

really nice, could you also make some advance thing about sparse gaussian process?

jiahao
Автор

Great video, I really liked the concrete example and the actual computation of integral approximations etc. I also really like the amount that you distinguish between what we know and what we don't know when defining the different distributions (i.e. *assuming we have a z*, we can plug in and get p(x, z)).

On that note, towards the end you talked about p(z, x=D), i.e. the joint over z and x where you've plugged in the observed dataset for x. You showed that this is actually not a valid probability distribution. Can you explain a bit more about why exactly that is the case? Why can't we simply treat the joint p(z, x=D) as the conditional. We are plugging in known data, and getting a value representing the probability of the latent.

Thanks as always for amazing content, keep it up! :)

addisonweatherhead
Автор

Hello, this might be a trivial doubt. But at 12:01 you estimate the P(Z, X=D) for one observed datapoint. What if we have more than one datapoint ? How will this equation be generalised ? Thanks a ton for the video !!

ashitabhmisra
Автор

Thanks for a great video. You mentioned that in order to make the connection between x and z in the likelihood function, p(x|z), we make z to be the mean of the Gaussian. As you know, in a Gaussian we have the term (x-z)**2. Now, x and z can have very different dimensions! In that case, how on earth can we take theri difference, let alone compute the p(x|z)? Thanks

MLDawn
Автор

Great video! I have a question, why don't we just model p(x) with some known distribution like Gaussian distribution? Why do we have to compute the integral of p(x, z) w.r.t. z?

yccui
Автор

Thanks for the video. However, I would like to ask: in the visualization, you computed the integral over Z, is this the marginal P(X)? But as you've said earlier in the video (7:58), it is intractable to compute. Is there something i'm missing?

sfdv
Автор

Really good video as always. But just make sure I understand the variational inference example. Say we are doing dog and cat image classification task. And in the dataset there are 40 percent dog images and 60 percent cat images. Z is latent variable and X is the image. For the prior, P(Z = dog) = 0.4 and P(Z = cat) = 0.6? The P(X|Z) is the likehood of the data, we won't know the actual probability, but we can approximate and train an approximater by using negative log likehood or some types of likehood function? And for the variational inference we just want to know the P(Z|X)? Is my example and understanding correct? Thanks

junhanouyang
Автор

why do not know the exact value of posterior, when we know the posteiror is propotional to the joint distribution which can be calculated ? Could you give a practical example to show that knowing the exact value of posterior is needed for an application ? Thank you.

jason
Автор

Thanks for your amazing videos about variational inference, it's extremely helpful! I have a question regarding the joint distribution p(z, x). It seems intuitive that we assume latent variable p(z) is a given distribution like a normal distribution, but what if we don't know the likelihood p(x|z)? Is it still possible to do variational inference and how should I understand this in the example of images and the camera settings? Thanks! 😊

tony
Автор

Maybe, this is a stupid question, but is it true that intractability of the marginal is valid only for continuous distributions of Z? For discrete distributions we can always calculate summation for all values of z in P(x, z) to get P(x). This implies that variational inference is applicable only for continuous distributions?

ritupande
Автор

I'm not sure that claim that we can plug in some value into continious likelihood and get probability value is correct. The probability of this value should be zero, because the measure of some value is zero. Plus p(x) can be greater than 1 and it's strange to have probability of something greater than one. Only integral of p(x) over domain has to be one. Or I missed something?

alexanderkhokhlov
Автор

bro explains even the reason why he put a sad face.

hasankaynak
Автор

What I do not understand: when looking at the ELBO, we still compare the surrogate function q(Z) - that is a valid probability - with the unnormalized probability p(Z, X=D), right? But why does this comparison even make sense? For me, it seems like VI is some magic to compare the surrogate q(Z) to an unnormalized probability instead to the (unavailable) normalized conditional. Is this actually the gist of it?

besarpria
visit shbcf.ru