filmov
tv
13. Logistic Regression Part 3
![preview_player](https://i.ytimg.com/vi/x6dM6mGe6C0/sddefault.jpg)
Показать описание
Introduction to Statistical Modelling
With Dr Helen Brown, Senior Statistician at The Roslin Institute, December 2015
*Recommended Youtube playback settings for the best viewing experience: 1080p HD
************************************************
Content:
Can we create a model to predict risk of death?
-Find the logistic regression model that best predicts death within 5 years
-Careful consideration needed on :
--- how to select best set of variables to include in model
--- avoid over-fitting (irrelevant variables cause noise)
-Potential strategies :
--- Include all available variables in model (approx 25)
--- Include only variables thought to be associated with mortality
--- Forward selection: Select variables one by one until no more have a p-value less than a set limit
--- Backward selection: Include all variables and delete one-by-one until all remaining have a p-value less than a set limit
--- Stepwise selection: Mixture of forwards and backwards selection
Include all available variables
-Potential for over-fitting
-Inclusion of effects with low p-values may add noise
-Not an ideal strategy, particularly if many independent variables
Forward Selection: Add variables one-by-one in order of significance
-Packages often have option to do this
-Set a maximum p-value acceptable, here set maximum to p=0.10
With Dr Helen Brown, Senior Statistician at The Roslin Institute, December 2015
*Recommended Youtube playback settings for the best viewing experience: 1080p HD
************************************************
Content:
Can we create a model to predict risk of death?
-Find the logistic regression model that best predicts death within 5 years
-Careful consideration needed on :
--- how to select best set of variables to include in model
--- avoid over-fitting (irrelevant variables cause noise)
-Potential strategies :
--- Include all available variables in model (approx 25)
--- Include only variables thought to be associated with mortality
--- Forward selection: Select variables one by one until no more have a p-value less than a set limit
--- Backward selection: Include all variables and delete one-by-one until all remaining have a p-value less than a set limit
--- Stepwise selection: Mixture of forwards and backwards selection
Include all available variables
-Potential for over-fitting
-Inclusion of effects with low p-values may add noise
-Not an ideal strategy, particularly if many independent variables
Forward Selection: Add variables one-by-one in order of significance
-Packages often have option to do this
-Set a maximum p-value acceptable, here set maximum to p=0.10