Multiple Regression in R, Step-by-Step!!!

preview_player
Показать описание

For a complete index of all the StatQuest videos, check out:

If you'd like to support StatQuest, please consider...

...or...

...buy my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...

...or just donating to StatQuest!

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:

#statquest #regression
Рекомендации по теме
Комментарии
Автор

Thank you for easily explaining something that my professor nor my book was able to explain well

tocinoconrad
Автор

Using tail alone actually results in an R^2 = 0.83 (p = 6E-4) compared to the adjusted R^2 = 0.79 for the multivariate model. Maybe could have been added to the video. Nice explanation regardless (y).

kusocm
Автор

Your videos are literally saving my career. I don't know whether words would be sufficient to express my gratitude. I have just one request, will you please upload videos on factor analysis also just like you did for PCA ?

aryanverma
Автор

wow this makes so much more sense than what my professor was trying to say thank you!

ciaracuesta
Автор

OH BOY love me some stat quest. I have a hard time during lecture to grasp everything when its being kept abstract but with these examples and a good sense of didactics this is a piece of cake! thanks josh! spreading this channel for sure

wanhope
Автор

Thank you so much for these video! They were very helpful in helping me to build an MLM for my MS degree! Thank you!

samwitty
Автор

I love your intro. The rest is also good but the "totes cray cray" bit was funny.

paulti
Автор

I was exactly looking for this. Thank you so much! Very well explained :) saving my thesis :D

ELECTRHEART
Автор

thanks. Spent like 5 hours trying to understand this

hustle-knxp
Автор

Yeah correct, adding a new variable weight actually reduces the Adjusted R^2 value, hence it is better to use tail alone to forecast the value as it as higher R^2 value.

bibeksharma
Автор

stat teachers have the best personalities, try and change my mind.

danialdunson
Автор

Great job! Exactly what I searched for!

ilhomsadriddinov
Автор

Great video as usual.
Can you please explain survival analysis and Cox regression models, and how to run them in R.
I think you will be the best person to explain them.
Thanks in advance.

muhammedhadedy
Автор

Hi Josh - I'm confused. In other video on linear regression you mentioned that when fitting a 'plane' in multiple regression, if adding a second variable doesn't reduce the SSE (i.e. explain more variation in y) then the coefficient of var2 would be set to '0' and the equation essentially ignores it.

But this video seems to say something slightly different - that is, if an additional variable doesn't add more explanatory power, then OLS will still assign a coefficient value, but the p value of that coefficient will be high signifying that the variable isn't significant.

One way to try bridge the difference, is that the second variable still reduces the SSE (i.e. line fit is better using the plane vs simple line), but that if we plot the second variable alone on a line, the residuals (errors) would be so large that we're not sure if the pattern we're seeing (desribed by the coefficient) is due to pure random chance. Have I got this right?

joerich
Автор

This what exactly i needed right now. Thanks a lot

anggunsausan
Автор

Thank you. What makes me think is that since weight and tail length are highly correlated, I would think using either one of them in the model is fine and these two predictors are interchangeable. But the results showed otherwise. I am wondering what might have made the difference? According to the previous lecture, I am guessing it is because that the sum of squares of the weight-only model is significantly bigger than the sum of squares of the tail-only model? Also, I remember there is a terminology about multicollinearity that if one predictor is highly correlated with the other, in multi-regression the matrix inverse will fail. I wonder if there is any indicator in the output about multicollinearity here.

Uynix
Автор

Great as usual. Can you do a quest for mixed models?

dalhoomist
Автор

I love your video, I like the way you explain the regression. Thank you

rizuri
Автор

These are great videos. So grateful for the clarity. Liked and Subscribed, keep going!!

ryantatton
Автор

At 7:20 you mentioned we can use tail alone rather than weight too to save time. How do we quantify the cost (eg. increased error, reduced R-sq, some other more interpretable business metric) of not using weight? Because I believe practically, people don't make decisions based on p-values alone but there must be some translation to more practical factors along the way.
For the 1st two metrics (error and R-sq), must we re-run the model by redefining the RHS of lm() each time we want to test a different predictor combination, or we can say something about the results of other predictor combinations from the summary() of 1 experiment alone. (Seems like both are true, which is contradictory). What would you say about the 3rd metric? (translating p-values to practical considerations)
I'm also not sure if using tail alone definitely leads to a worse/equal result (as measured by error, or some other metric) than tail + weight? If I reason from the fact that using more predictors will never reduce R-sq, then this statement is true?

Han-veuh