13. Logistic Regression Part 3

preview_player
Показать описание
Introduction to Statistical Modelling
With Dr Helen Brown, Senior Statistician at The Roslin Institute, December 2015
*Recommended Youtube playback settings for the best viewing experience: 1080p HD
************************************************
Content:
Can we create a model to predict risk of death?
-Find the logistic regression model that best predicts death within 5 years
-Careful consideration needed on :
--- how to select best set of variables to include in model
--- avoid over-fitting (irrelevant variables cause noise)
-Potential strategies :
--- Include all available variables in model (approx 25)
--- Include only variables thought to be associated with mortality
--- Forward selection: Select variables one by one until no more have a p-value less than a set limit
--- Backward selection: Include all variables and delete one-by-one until all remaining have a p-value less than a set limit
--- Stepwise selection: Mixture of forwards and backwards selection

Include all available variables
-Potential for over-fitting
-Inclusion of effects with low p-values may add noise
-Not an ideal strategy, particularly if many independent variables

Forward Selection: Add variables one-by-one in order of significance
-Packages often have option to do this
-Set a maximum p-value acceptable, here set maximum to p=0.10
Рекомендации по теме