26 - Prior and posterior predictive distributions - an introduction

preview_player
Показать описание
This video provides an introduction to the

Рекомендации по теме
Комментарии
Автор

More you learn, more confused you get

looploop
Автор

I come back to this video so many times now
really good thank you!!

AnasHawasli
Автор

These are excellent lectures. Thank you for all your nice work

NehadHirmiz
Автор

Fantastic. Thank you! I have been struggling to understand this until I watched this video.

jinrussell
Автор

Cool video! I finaly understood it! There are not so much about this theme in the web... thank you!

olichka
Автор

Very well explained!



Didn't get it though

thischannelhasaclevername
Автор

I don't understand what theta is so this was confusing.

theforestgardener
Автор

Full Text: in this video I want to explain the concepts of prior and posterior predictive distributions so if we start off with the prior predictive distribution what exactly do we mean by this concept well it's quite simple really it's just the distribution of data which we think we were going to obtain before we actually see the data so an example here might be let's say we're flipping a coin 10 times and we cool every time a heads comes up we call that a value of 1 and every time a tears comes up we call that a value of 0 if we're flipping it 10 times and we think that the coin is relatively fair then our sort of frequency distribution for the sort of values that we think we might obtain might look something like this yellow line which I'm drawing here and so you might think that it was relatively centered around the 5 mark here and by frequency distribution I sort of mean just as much just a probability distribution so our PDF looks something like this so this would be our prior predicted distribution and this is based on our prior sort of knowledge about the situation but how can we actually calculate the prior predicted distribution well the idea is that what we're trying to obtain is we're trying to obtain the probability of our data so the probability of let's say Y in this circumstance and this is a marginal probability and we know that we can get to a marginal probability by integrating out all dependence on theta my sort of parameter of our joint probability of Y and theta so if we integrate theta across all the range which theta can say bin so theta sub set of large theta the entire range which can sit in and by integrating this out across all range of theta we're removing this theta dependence and we're just left with a marginal probability but furthermore we know that we can rewrite this using Bayes rule because we know Bayes rule tells us that the probability of Y given theta so the conditional probability of Y given theta is equal to the joint probability the probability of Y and theta divided 3 by the probability of theta actually this isn't so much Bayes rule this is just more the rule of conditional probability so that means that we can actually multiply through by the probability of theta or prior and that gives us our joint probability so that allows us to rewrite this integral as the integral from in this case across all range of theta and we're going to integrate now the likelihood the probability of Y given theta times our prior and we're still integrating across choice of theta so that's how we can get our prior predicted distribution from taking the likelihood and multiplying it by our prior and then integrating our overall parameter choices so that's the prior predicted distribution what is meant by the posterior predictive distribution well this is the idea about this is that essentially this is what value of data we would expect to obtain if we to repeat the experiment after we have seen our sort of data from our current experiment so it's what sort of value would we predict if we were to run the experiment again so the idea here is that after we have flipped our coin in let's say 10 times and let's say it comes up heads nine of those times that might lead us to expect that the coin is in fact biased so if we were to flip the coin another 10 times we might expect that the sort of value which our coin will come up in those 10 times might be something like towards 9:00 so this would be here this sort of move line here would be our posterior predictive distribution and just like the prior predicted distribution it is a valid probability distribution so again this area underneath this code should integrate to one I know the way I've drawn them here it doesn't look like that but they both should integrate to the same value of one so how do we calculate this well the idea is that what we're trying to do is we're trying to calculate the probability of a certain value of our data and all sort of new data which I'm calling here Y prime given that we have observed current data wine and we can sort of just forget about this conditioning here and just remember that essentially this is the same as this is a marginal probability even though it's kind of conditional and we can get that marginal from integrating out the joint probability of y primed and theta remembering that we're still kind of conditioning on Y across all range of theta and just like before weak use the rule of conditional probability to rewrite this this is just the integral of the probability of y prime given now theta and also Y times the probability of theta given Y so integrating that over all range of theta and what do we have here inside our integral well the second term here this is just the posterior distribution which we actually obtained from doing our experiment in the first place and what is this because this looks a little bit more complicated this first term here it is more complicated until you realize that normally when you condition on theta the parameter normally our new observation is independent of the old observations of cos theta tells you everything you need to know about that of new observations so normally we can remove that conditioning and now this is simply a likelihood so the idea is that if you take the likelihood and you multiply 3 by the posterior and you integrate over all parameter ranges that gives you the posterior predicted distribution which is the distribution of observations that we would expect for a new experiment given that we have observed the results of our current experiment

sulgrave_
Автор

Which one is the posterior?
p(y'|y) or p(θ|y)?

mattetis
Автор

This is very clear and intuitive. Thanks a lot, Sir.
By the way, what software do you use in this video? Like Notability in iOS? I need to find this kind of software, and haven't found it yet.

ArdianUmam
Автор

Is P(y, theta) the same as P(y and theta), where they typically use a upside down U for the and?

briansalkas
Автор

Isn't the P(Y) the denominator in the Bayes rule?

hohinng
Автор

I'm a little confused: where does P(Y | theta) come from if we're calculating the prior predictive probability distribution before seeing the data? If it is the likelihood (as is said at 2:35), that means we've already seen the data.

doughnut
Автор

I1. am confused is P(y) and P(theta) both a prior?
2. and is p(y|y') the posterior or the p(theta|y) the posterior?

ywk
Автор

fantastic. i was having a bit of trouble with the manipulation of the conditioning variables, this cleared it up.

NotLegato
Автор

About the posterior probability when you applied conditional probability rule how it changed? it should not be devided by p(y)? I do not understand this part.

sepidet
Автор

I feel like you already need to understand Bayes theorem to get anything from this video. This more of a "review" not really an introduction.

alexander
Автор

At 2:37 you call P(theta) the "prior" and use it to get our "prior predicted distribution." These are two different things? It would be helpful to explain this.

tuber
Автор

Does the theta here mean the population mean?
if so, then what does P( Y, theta ) mean? 

mikahebat
Автор

isn't your posterior actually a predictive posterior equation, not the true posterior of the parameters?

MattyHild