Partial dependence plots for Mario Kart world records

preview_player
Показать описание
Рекомендации по теме
Комментарии
Автор

Hey Julia great screencast as always! You can also consider taking into account the Matthews Correlation Coefficient when selecting the best model. It's a nice metric, implemented as 'mcc' in yardstick. It's a bit more strict since it generates a high score only if the prediction correctly classified a high percentage of negative cases and a high percentage of positive ones.

alexandroskatsiferis
Автор

Is the reason you chose accuracy over a metric like precision/recall because the dataset was relatively balanced?

AshBlossomWorshiper
Автор

Hi Julia, thank you so much for all your great and informative screencasts! I have a question about the partial dependence plots. In the PDP package in R, there is an option that we can add rugs to the x-axis that display the deciles of the distribution because based on the package manaul "It is not wise to draw conclusions from PDPs in regions outside the area of the training data". I wonder if this option is also available in DALEX?

hamidehmoayyed
Автор

Julia, your videos are helping me so much in grad school, thanks so much! Is it always the case that vfold_cv has a higher ratio of training to assessment, and is there a good threshold for saying 'I should really switch to bootsraps, this vfold is too small?'(I noticed you did the same thing in the IKEA furniture video). Also, would there be much difference here if you used a different tree based model like ranger::rand_forest as in the IKEA video? (And how would you decide which to use)?

JJManioke
Автор

I wanted to see who holds the top record for each track as of the latest date we can see.

records %>%
group_by(track) %>%
filter(date == max(date))

There are 16 tracks so we should have 16 records if we're grouping by track, right? However, this is not the case.

Notice there are many duplicate records which indicate shortcut yes AND shortcut no for the same exact - everything.

Such as records 4:7 for Kalimari Desert where Dan has 2 records set on the same day, with 2 observations made for each record one with shortcut == yes, and one with shortcut == no.
I am pretty sure you cannot have the same exact time and everything when using and not using shortcuts. Maybe I am wrong but these times are identical down to the thousandth of a second.

Then again lookat rows 12:13

Shortcut
12 Wario Stadium
Single Lap No Dan PAL 2021-01-26 1M 25.82S 85.82 31

13 Wario Stadium
Single Lap Yes Dan PAL 2021-01-26 1M 25.82S 85.82 31

I think to have accurate predictions we have to weed out the incorrect duplicates. How do we know which is the correct observation to be used in shortcut prediction?

You can see right at 4:00 in your video how the single lap records are almost identical for No and Yes, they're duplicate records. This cannot be correct. It does make sense for the three lap times where shortcuts being used significantly reduce time, but for the single lap records, almost complete duplication.

Therefore, predictions for shortcut should only be used on records made on three lap runs, and one lap runs should be omitted.

Definitely needs to be cleaned up before prediction is performed. Make sense?

My-NaMeS_jEfF
Автор

Miss Julia where do you live and where have you studied and pls let me know what have mean I want to know what you have to do to become a data

Dolandtromm