TidyTuesday: Tuning Pre-Processing Parameters with Tidymodels

preview_player
Показать описание
In this week's #TidyTuesday video, I show how to tune pre-processing methods for regression models using #Tidymodels. I go over common problems with pre-processing such as choosing thresholds and a number of components. I also go over use cases for regression metrics such as RMSE and MAE. I then tune a basic lasso regression model and choose the optimal parameters for a house pricing model.

Intro: 0:00
Data Partitioning: 1:52
Pre-Process Tuning: 2:50
Defining Model: 8:09
Creating Parameter Grids: 9:30
Model Tuning: 13:25
Regression Metric Explanation: 14:35
Model Evaluation: 17:27
Model Finalization: 22:04

#DataScience

PC Setup (Amazon Affiliates)
Рекомендации по теме
Комментарии
Автор

Once again, thank you! Please keep doing more tidymodels screencasts :)

mkklindhardt
Автор

Your tuts are wonderful Andrew! Pleas show more regression and perhaps modeltime models like arima_boost, prophet_boost. Thank you so much Andrew!!!

hansmeiser
Автор

awesome, I had no idea how to tune recipes and model parameters at the same time - so simple

VinnieGaul
Автор

Wow!! Awesome tutorial! I kept getting NA values when tuning and filtering by threshold as you did help me understand how to deal with that issue. Great idea to use linReg for other model tuning parameters. I'll start using MAE now more too. Super helpful, thanks.

micahshull
Автор

I would also like to see you preparing a custom step from start to end :-).

hansmeiser
Автор

Thanks for another great video. Really enjoying the tidy models theme. Andrew do you have any recommended resources for getting up to speed in xgboost in R.

jamesmundy
Автор

Hi Andrew,
Do you suggest similar approach when tuning recipe pre-processing steps for more "black box" kinda models, e.g. XGBoost or Random Forest?

mkklindhardt
Автор

Hey Andrew can you answer this question?
Let's say param object is filled with a finalized workflow, coming from disk (my_finalized_workflow), including a final recipe with tuned parameters. Should we then use param preprocessor- or is the recipe automatically used in function fit_resample() via the finalized workflow in param object? Or should we bake the recipe on my_folds manually ? When I use a step_rm(varname) in the recipe and look into the output of fit_resemaple()['splits'], it looks the function did not use any recipe with step_rm(varname) at all.

*fit_resamples(*
object=my_finalized_workflow
preprocessor,
resamples = my_folds
metrics = NULL,
control = control_resamples()
*)*
When I try to bake the data for the folds manually
train <- train %>% bake(prep(rec),  .) ... fit_resamples() gives me:
x Fold1: preprocessor 1/1: Error: Not all variables in the recipe are present in the supplied trainin...
x Fold2: preprocessor 1/1: Error: Not all variables in the recipe are present in the supplied trainin...

hansmeiser
Автор

Is it possible to merge two grids (recipeGrid+modelGrid) with different levels/row-counts without producing NA's?

hansmeiser
Автор

Merge is no longer in namespace dials.

hansmeiser