Machine Learning Model Comparison with Bootstrap Resampling | sklearn Implementation

Показать описание

At some point in your machine learning analysis, you want to be able to say that classifier A is better than B and that this is statistically significant. In this video I will show you a technique to be able to make such a statement which make uses of Bootstrap Resampling.

Acknowledgement:
- music from the youtube library
- used seaborn for the violin plot
- thumbnail made with Canva

This is what Wikipedia as to say about Bootstrap Methods:
"Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance, confidence intervals, prediction error, etc.) to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

Bootstrapping estimates the properties of an estimator (such as its variance) by measuring those properties when sampling from an approximating distribution. One standard choice for an approximating distribution is the empirical distribution function of the observed data. In the case where a set of observations can be assumed to be from an independent and identically distributed population, this can be implemented by constructing a number of resamples with replacement, of the observed data set (and of equal size to the observed data set).

It may also be used for constructing hypothesis tests. It is often used as an alternative to statistical inference based on the assumption of a parametric model when that assumption is in doubt, or where parametric inference is impossible or requires complicated formulas for the calculation of standard errors. "

The technique can be summarized as this:
- Sample your data with replacement to create a bootstrap dataset (same length)
- Run your machine learning pipeline on that bootstrap dataset.
- add the performance metric to your distribution.
- repeat this N amount of time

At the end of this process you will have a distribution of your performance metric. You can then use that distribution to compare it with another model distribution. If the middle 95% of the distributions from model A and B don't overlap you can say that the improvement of model B is statistically significant with p smaller than 0.05 for a bootstrap resampling of n = 1000!

----

----
Follow Me Online Here:

___

Have a great week! 👋

Рекомендации по теме

Комментарии

Hi, thank you for sharing! Do you mind sharing the code file also? Thanks .

lightsflashing

Hi, can you share example on fine and gray modeling? Also, where I can get these codes? thanks!

haoduong

Nice explanation, thank you!
I have a question. This is basically a model selection technique, so in which way is this better (or worse) than other techniques? Why would I want to use this over something like - say - cross validation?

m-pana

Since you are doing Model Selection, why not use Alkaike Information Criterion?

nadarasarbahavan

Machine Learning Model Comparison with Bootstrap Resampling | sklearn Implementation

All Machine Learning Models Explained in 5 Minutes | Types of ML Models Basics

AI vs Machine Learning

Classification and Regression in Machine Learning

Machine Learning Algorithm- Which one to choose for your Problem?

Supervised vs Unsupervised vs Reinforcement Learning | Machine Learning Tutorial | Simplilearn

Supervised vs. Unsupervised Learning

Comparing machine learning models in scikit-learn

How to evaluate ML models | Evaluation metrics for machine learning

Supervised Learning: Classification in Machine Learning | AIML End-to-End Session 44

7 Steps to Build a Machine Learning Model

Machine Learning Explained in 100 Seconds

Machine Learning vs Deep Learning

Machine Learning | What Is Machine Learning? | Introduction To Machine Learning | 2024 | Simplilearn

THIS is HARDEST MACHINE LEARNING model I've EVER coded

When M1 DESTROYS a RTX card for Machine Learning | MacBook Pro vs Dell XPS 15

All Machine Learning algorithms explained in 17 min

Build your first machine learning model in Python

AI VS ML VS DL VS Data Science

Machine Learning Tutorial 3 - Intro to Models

10 ML algorithms in 45 minutes | machine learning algorithms for data science | machine learning

Deep Learning vs. Machine Learning, which is better?

Machine Learning for Everybody – Full Course

I can't STOP reading these Machine Learning Books!

Support Vector Machine (SVM) in 2 minutes