video 10.2. post hoc tests & family-wise error

preview_player
Показать описание
closed captioning text:
If you do an ANOVA and it is statistically significant, if you reject the null hypothesis, then you have to do a post hoc test to tell which of the many means are different from one another, but you might wonder why do you even have to do an ANOVA at all. Why not just do t-tests that compare each of the groups? So I will briefly explain that before I introduce post hoc tests.

So the reason you can not just compare all the different means... so if we have intro, research design, and statistics, we have those those three different samples. If we do t-tests, the first two tests would be comparing these two. That would have an alpha equal to .05. Then if we do comparison of these two, that would have a different alpha equal to .05. So we have to compare 1 and 2, 2 and 3, and then also 1 and 3. Now I would have at alpha of .05. So we would have to do three statistical tests, instead of one. If we do three statistical tests instead of one, then our alpha is not going to be .05 anymore. So if you do just one test, alpha is .05, but if you have three means, and you have to do three comparisons, then the probability of a type 1 error with three different means, the alpha turns into .143. You might think it should be .05 times 3, but what happens is that some of the times you would have a false positive on two different tests or even all three. That is why this is not .15.

So this problem of inflated alpha here is called "family-wise error rate." So your false positives are higher if you do more comparisons. So the family-wise error rate is the probability of a type 1 error for a set (that is, a "family") of statistical tests. This is why when we are comparing three different means, these for example, we do an ANOVA first rather than doing three t-tests. As you will see in a little bit, the post hoc test is actually, basically, doing three different t tests, comparing each of these pairwise. It is slightly modified though.

So when you are doing ANOVAs and you are comparing three or more means, the first thing that you will want to do is the regular ANOVA that we talked about in the previous video. That is called the "omnibus ANOVA" because this one analysis of variance compares all three means at the same time. Only if you reject the null hypothesis, if your p-value is less than alpha, if your test statistic is more extreme than your critical value, only then you can do the post hoc tests, which will tell you which means are different from one another. Post hoc means "after the fact" or "after this", so after the overall ANOVA is significant, then you will do those pairwise comparisons. By using this strategy, it keeps the overall alpha rate to .05 like you want it to be.

So there are a lot of different post hoc tests available for researchers to use. I am gonna stick with Tukey's honestly significant difference test, because it is very similar to a t-test, and it is commonly used as well. So it is also known as Tukey's HSD, honestly significant difference test. So Tukey's HSD post hoc test. So that is what we will do next here. So recall the first thing is the omnibus ANOVA has to be statistically significant. That term "statistically significant" means that you have rejected the null hypothesis. So you only do these post hoc test if that is the case. Then so if that is significant, you do your post hoc test. Tukey's is a modified t-test. That is the easiest way to think about it: a modified independent-groups t-test.

So one reason it is different than the t-test is with a t-test you only had two different groups, but with having done an ANOVA, we have three different groups. So we have three different estimates of the population's variance, and we are going to use all of those for our t-test. So for our standard error, what we are gonna do for this, is use the variance-within divided, by the sample size. Remember this will work only if your sample sizes are all the same, which they are for us. Of course, it has got to be the square root. This is the population's variance estimate, and we want to turn it into a standard error. We need to divide it by the sample size, and then take the square root. Another way it is a little bit different than the t-test, is there is a modified critical values table. This will keep our alpha at .05 or so.
[closed captioning continued in the comments]
Рекомендации по теме
Комментарии
Автор

Great lecture! this channel is a hidden gem.
So if my anova was significant but my post-hoc tests with benforroni adjustment for p-value were all insignificant, what does that mean? how I can report this?
thanks in advanced!

Hasan-qguh
Автор

[closed captioning continued]
So the formula, it is kind of like a t statistic, so it is similar to, it is approximately equal to, for the honestly significant different test, it is one of the means, minus another one of the means, divided by that standard error calculated the way I just mentioned. You do this for all pairwise comparisons. So these two, these two, and these two.

I will need to calculate this standard error. So, we calculated the "variance estimate within" earlier .Remember, square this to turn this into a variance, this into a variance, this into a variance, take those three, and you average them out and you get 100. That will be on top here. 100. Divide that by our sample size. Our sample size is 33. Then, we need to take the square root of that. By my calculations that was 1.74. So now we have got the denominator that we will use for all of our pairwise comparisons.

Next, we will calculate all of our different "HSD" values. They are things that are like t values. So we need mean 1. So we will call that intro. So 100 ... HSD. We will call it 1 ... equals 100 minus... We will compare that with research design first... minus 100, divided by 1.74, and that will give us a zero. We already know since this is similar to a t, this is not gonna end up being significantly different. So we got to do this for all pairwise comparisons. So this is our second one. We will go with intro again. It is 100, minus 120, divided by 1.74. It is arbitrary if it was 100 minus 120 or 120 minus 100. It is a two-tailed test, so it does not matter which way you do it. Here we get -20, divided by 1.74. This equals -11.49. Then we need our third pair. That is going to be the research design class minus the statistics class divided by 1.74. So this is the same as it was earlier. So we are gonna add up getting the same result here. 11.49.

The next thing we need to do is look up the critical value in the table. I did that earlier, and for this test it was 3.40. So any HSD values more extreme than 3.40, we are gonna reject the null hypothesis that these are the same. So notice we are really treating these having three separate null hypothesis at this point. So we got our critical value next.

For each we decide to reject the null hypothesis or fail to reject it. We fail to reject that one because it is less extreme than 3.4. This one we would reject. This is more extreme. So we would reject the null hypothesis here as well. So once we have decided for each, then we interpret our findings. So we find out what we learned in the study.

So to interpret our findings, this first test compared intro and research design. It was not significant. We failed to reject the null hypothesis, so this is inconclusive. We do not know if intro and research design are equally difficult, or if one is more difficult than the other. To interpret the next one, that was intro and stats, we did reject the null hypothesis here. So then we look at the means, and we see statistics has a higher mean than intro, so that means statistics ... we will interpret this as the exam being easier for statistics than it was for intro. Then for our third test that compared statistics and research design, that was also statistically significant. We rejected the null hypothesis, So we interpret this one as statistics is easier than research design. So overall our omnibus ANOVA was statistically significant, and then we did our post hoc test, and found using our post hoc tests, that statistics was easier than research design and intro, but we were not able to conclude anything about how those two compared with one another.

statisticsforpsychology