Tuning random forest hyperparameters with tidymodels

preview_player
Показать описание
Follow along to see how to tune hyperparameters and then use the final best model, using #TidyTuesday data on trees around San Francisco.

Рекомендации по теме
Комментарии
Автор

It is just incredible the domain you have with R, thanks for sharing quality content like this.

lucianobatista
Автор

Wonderful tutorial as usual. Thanks a ton Ms. Silge.

SadatQuayiumApu
Автор

Great job Julia, Really usefull work, you answered many of my questions.

shahrdadshadab
Автор

Hi Julia, thank you for sharing the great tutorial! Is it possible to tune hyper-parameters using genetic algorithm in Tidymodels?

yangyang
Автор

I really like your videos! You're helping me a lot in my masters lol 
It'd be awesome if you made one about bagging with decision trees and tuning parameters as well. 
Keep up the great work!

vivi
Автор

Very interesting and informative exercise. I have one question. Why not use a package like VIM to get a quick overview of "missings" in the dataset like in the plot the aggr() function generates in VIM?

haraldurkarlsson
Автор

Thank you so much ! Best video about tuning RF

TheTeksan
Автор

Hi Julia and everyone,

Hope you had a good weekend. I am working with a large geo-referenced point data with biophysical variables (e.g. soil pH, precipitation etc) and would like to test for spatial autocorrelation in my data. Any tidy-friendly way to test for this particular autocorrelation? As it is an element important for my further Machine Leaning regression analysis I would need to take it into account to not produce algorithms/models that a erroneous in predicting the outcome variable.

Thank you

mkklindhardt
Автор

Hi Julia - when you down sample in recipes, is this pre processing also being passed to 'new data' when you use the predict function?

oliverarmstrong
Автор

Hey Julia,
You mentioned in there something about tuning parameters for recipes. Is there an easy way to do this in tidymodels, and if there is, would you be able to point me in the right direction for a good reference? Thank you!

matthewryan
Автор

Hey Julia,
thank you for this video. This is helping me a lot for a course in my bachelor's degree! I am using your code right now and I am stuck now for at least 3 hours at the first training part with grid = 20. Do you know any method on how to track the process? I would really love to know if it takes an other few hours or days.
It woul be cool if you could give a quick answer and maybe tell me how long the train took for the case in the video.
Thank you and have a great day.

kaihennig
Автор

@ 50:00, this is a VIP based on the tuned model using the *training data*. Isn't it good practice to create a final model using *all data*? And if that is the case wouldn't it be more beneficial to report the VIP using that new model? Does last_fit() also create a new model using *all data*?

eddytheflow
Автор

hey huge thx to your work! Just some questions at 53:35, why did you set importance as "permutation"? and what kind of fit (algorithm or mathematical model) is used to narrow down the order of variable importance?

hunlee
Автор

Thank you for such a helpful tutorial! If I have a new dataset (with the same column setup) and I want to apply the model I built to that dataset, what should I do?

wayanadolan
Автор

Hello julia
I have question when I fit my model of random forest in test data set some levels of factor variables don't exist ? How I can solve it?

Thanks for everything you shared with us very helpful we enjoy:)

abdelouahebhocine
Автор

When and where in our code, can/should we say: 'finalize(mtry(), trainData)'... this seems to be not the intended location...
daGrid <- tidyr::crossing(
mtry = dials::finalize(mtry(), trainEn)
, trees = c(30) # max 2000
, min_n = c(5) # max 40
, tree_depth = c(2) # max 15
, learn_rate = c(0.1, 0.5) # max -1
, loss_reduction = # max 1.5
, sample_size = c(0.5, 0.01) # max 1
)

hansmeiser
Автор

Thank you for the tutorial and I do have a question. How do I plot the final AUC curve?

jasonosman
Автор

Hi again Julia,

I was wondering if tidymodels have some specific framework in place for dealing with spatial (or temporal) auto-correlation of data?

mkklindhardt
Автор

How much time did you wait? I am curious :) mine is taking too much.... 38:30

jamespaz
Автор

Hi Julia, thank you so much for your videos. I have a question for you: is there a way to restrict the values in grid_regular() to a specific set of values es c(500, 1000, 1500) or something like seq(400, 2400, by=200)? It looks like the param function expects a range with min and max only. I'm pretty much sure there is a way but even googling I could not figure it out. Can you help me?

AndreaDalseno
welcome to shbcf.ru