Partial dependence plots for Mario Kart world records

Показать описание

Julia Silge

Рекомендации по теме

Комментарии

Hey Julia great screencast as always! You can also consider taking into account the Matthews Correlation Coefficient when selecting the best model. It's a nice metric, implemented as 'mcc' in yardstick. It's a bit more strict since it generates a high score only if the prediction correctly classified a high percentage of negative cases and a high percentage of positive ones.

alexandroskatsiferis

Is the reason you chose accuracy over a metric like precision/recall because the dataset was relatively balanced?

AshBlossomWorshiper

Hi Julia, thank you so much for all your great and informative screencasts! I have a question about the partial dependence plots. In the PDP package in R, there is an option that we can add rugs to the x-axis that display the deciles of the distribution because based on the package manaul "It is not wise to draw conclusions from PDPs in regions outside the area of the training data". I wonder if this option is also available in DALEX?

hamidehmoayyed

Julia, your videos are helping me so much in grad school, thanks so much! Is it always the case that vfold_cv has a higher ratio of training to assessment, and is there a good threshold for saying 'I should really switch to bootsraps, this vfold is too small?'(I noticed you did the same thing in the IKEA furniture video). Also, would there be much difference here if you used a different tree based model like ranger::rand_forest as in the IKEA video? (And how would you decide which to use)?

JJManioke

I wanted to see who holds the top record for each track as of the latest date we can see.

records %>%
group_by(track) %>%
filter(date == max(date))

There are 16 tracks so we should have 16 records if we're grouping by track, right? However, this is not the case.

Notice there are many duplicate records which indicate shortcut yes AND shortcut no for the same exact - everything.

Such as records 4:7 for Kalimari Desert where Dan has 2 records set on the same day, with 2 observations made for each record one with shortcut == yes, and one with shortcut == no.
I am pretty sure you cannot have the same exact time and everything when using and not using shortcuts. Maybe I am wrong but these times are identical down to the thousandth of a second.

Then again lookat rows 12:13

Shortcut
12 Wario Stadium
Single Lap No Dan PAL 2021-01-26 1M 25.82S 85.82 31

13 Wario Stadium
Single Lap Yes Dan PAL 2021-01-26 1M 25.82S 85.82 31

I think to have accurate predictions we have to weed out the incorrect duplicates. How do we know which is the correct observation to be used in shortcut prediction?

You can see right at 4:00 in your video how the single lap records are almost identical for No and Yes, they're duplicate records. This cannot be correct. It does make sense for the three lap times where shortcuts being used significantly reduce time, but for the single lap records, almost complete duplication.

Therefore, predictions for shortcut should only be used on records made on three lap runs, and one lap runs should be omitted.

Definitely needs to be cleaned up before prediction is performed. Make sense?

My-NaMeS_jEfF

Miss Julia where do you live and where have you studied and pls let me know what have mean I want to know what you have to do to become a data

Dolandtromm

Partial dependence plots for Mario Kart world records

Partial dependence plots for Mario Kart world records

Partial Dependence Plots (Opening the Black Box)

Model Auditing: Partial Dependence Plot (PDP) and Individual Conditional Expectation (ICE)

Partial Dependence Plot (PDP) in Python

Partial Dependence Plot in Knime

Interpretable Machine Learning - Feature Effects - Partial Dependence (PD) Plot

Partial dependency plot: How feature impact model prediction?

4.3 - Partial Dependence Plot - eXplainable AI

23. Partial Dependence Plot / DAI Starter Course

Explaining Hyperparameter Optimization via Partial Dependence Plots (NeurIPS'21)

R : Partial dependence plot from an xgboost model in R

Kaggle 30 Days of ML (Day 17) - Partial Dependence Plot - Interpretable Machine Learning - XAI

R : How can I create a Partial Dependence plot for a categorical variable in R?

Interpretable Machine Learning - Feature Effects - Individual Conditional Expectation (ICE) Plots

Interpretable Machine Learning - Feature Effects - Accumulated Local Effect (ALE) Plot

#129: Scikit-learn 123: Visualizations

Variable importance: Less is more

Explain Machine-learning Models: Individual Conditional Expectation (ICE) in Python

SMOOTHNESS AND MONOTONICITY CONSTRAINTS FOR NEURAL NETWORKS USING ICEnet

Model Interpretability in MATLAB

Gabe sees Gaby again for the first time since she moved away months ago. Nonverbal Autism Family

Interpreting Black-Box Supervised Learning Models Via Accumulated Local Effects

Machine learning of factors related to production crashes at an oyster hatchery

Use R to Plot Johnson Neyman