Machine Learning Classification How to Deal with Imbalanced Data ❌ Practical ML Project with Python

Показать описание

Microsoft Azure Certified:

Databricks Certified:

---

---

COURSERA SPECIALIZATIONS:

COURSES:

LEARN PYTHON:

LEARN SQL:

LEARN STATISTICS:

LEARN MACHINE LEARNING:

---

For business enquiries please connect with me on LinkedIn or book a call:

Disclaimer: I may earn a commission if you decide to use the links above. Thank you for supporting the channel!

#DecisionForest

Рекомендации по теме

Комментарии

I am not sure if this is a best way to deal with data imbalance and it won't work in a real case. You have used SMOTE to balance the dataset and used your test dataset from the oversampled data which is synthetic. To make sure your model is working well, you have to save part of the original imbalance dataset as your test dataset and then apply SMOTE on the rest. In this way your test dataset is a perfect representation of the original data. I am sure you f1-sccore will be very small. One of the best methods are One Class Support Vector Machine (OCSVM), Generalized One-class Discriminative Sub-spaces (GODS), One Class CNN (OCCNN) and Deep SVDD (DSVDD)

amansamsonmogos

Hello thanks for the video.

However I noticed that your did SMOTE before running the train test split. I am afraid that this might be causing the results to improve drastically since the the upsampled observations from the minority class might have entered the testing dataset. So basically your model learned and test on pretty much the same variable which caused the results to improve.

Let me know what you think.

ammarkamran

You do realize that in your pipeline, once you run the oversample step, you have 6 perfectly balanced groups with 900 samples in each group. There's no real majority class to sample from. When you then undersample from a perfectly balanced dataset, it appears to leave group intact and resamples the others. If you plot the data, it will look essentially the same as before, when you only oversampled, with some samples missing and other samples duplicated. The scores will be similar as well.

philwebb

Can you suggest any techniques to solve imbalanced image dataset??
Thank you..

nickpgr

In order to truly evaluate you need to test on an IMBALANCED test set. :) you can train on a balanced train set but hold out needs to be on a true imbalanced set . Because in the real world the data you encounter will have the same imbalanced-ness and that’s what your performance metric needs to measure: how well you score on unseen imbalanced data.

TrainingDay

Hi thanks for the content!

I am confused that, instead of applying this method for the y variable, can I apply this technique for imbalanced predictors that have levels with large differences in sample size?

For example, class A: 900, class B:100, class C: 2

Thanks!

kar

Unfortunately, your website link and notebook link are not available here. Any suggestion?

subhajit

When should we use under_sampling? As I see there's a potential risk of losing information

rahuldey

I don't understand how you apply under and oversampling at the same time. One of them will balance the data, and the other one has nothing left to do...

Mustistics

It is not clear sir, and I have a question, what is the technique u have used in sorting the problem class imbalance?

wliiliammitiku

You said to deal with 'multi-class classification problems'. But what if we have imbalanced data and binary classification?

titow

Please how can we get the jupyter notebook code?

tahirullah

Thanks for this share.
Please Could you send me this code?
I need it .

mahdimed

Excuse me, is there any way to find the original Notebook file? Can't open the one in the description. Thank you.

itstoufique

are u sure that the undersampling method is work? the number still same 900.

farisocta

Machine Learning Classification How to Deal with Imbalanced Data ❌ Practical ML Project with Python

Classification In Machine Learning | Machine Learning Tutorial | Python Training | Simplilearn

All Machine Learning Models Explained in 5 Minutes | Types of ML Models Basics

Top 6 Machine Learning Algorithms for Beginners | Classification

Classification and Regression in Machine Learning

What is classification in Machine Learning | Binary and Multi-class classification

Classification in Machine Learning | Machine Learning Tutorial | Python Training | Edureka

Machine Learning in Python: Building a Classification Model

Difference between classification and regression [CLASSIFICATION & REGRESSION] 2021

Supervised Learning: Classification in Machine Learning | AIML End-to-End Session 44

Supervised vs. Unsupervised Learning

Classification in Machine Learning | Machine Learning with Scikit Learn | Intellipaat

Machine Learning Tutorial Python - 18: K nearest neighbors classification with python code

Classification Algorithm in Machine Learning

93 Choosing The Right Model For Your Data 3 Classification | Scikit-learn Machine Learning Models

Machine Learning for Everybody – Full Course

Machine Learning Classification with Python....BUT AUTOMATED?!

Machine Learning | Classification

Supervised vs Unsupervised vs Reinforcement Learning | Machine Learning Tutorial | Simplilearn

Difference between Classification and Regression - Georgia Tech - Machine Learning

13. Classification

Basics of Classification Use Case|What is classification in machine learning

Machine Learning Tutorial Python - 8: Logistic Regression (Binary Classification)

Python TensorFlow for Machine Learning – Neural Network Text Classification Tutorial

Decision Tree Classification Clearly Explained!