Handling imbalanced dataset in machine learning | Deep Learning Tutorial 21 (Tensorflow2.0 & Python)

Показать описание

Credit card fraud detection, cancer prediction, customer churn prediction are some of the examples where you might get an imbalanced dataset. Training a model on imbalanced dataset requires making certain adjustments otherwise the model will not perform as per your expectations. In this video I am discussing various techniques to handle imbalanced dataset in machine learning. I also have a python code that demonstrates these different techniques. In the end there is an exercise for you to solve along with a solution link.

#imbalanceddataset #imbalanceddatasetinmachinelearning #smotetechnique #deeplearning #imbalanceddatamachinelearning

Topics
00:00 Overview
00:01 Handle imbalance using under sampling
02:05 Oversampling (blind copy)
02:35 Oversampling (SMOTE)
03:00 Ensemble
03:39 Focal loss
04:47 Python coding starts
07:56 Code - undersamping
14:31 Code - oversampling (blind copy)
19:47 Code - oversampling (SMOTE)
24:26 Code - Ensemble
35:48 Exercise

#️⃣ Social Media #️⃣

DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.

Рекомендации по теме

Комментарии

Hi. You should perform under / over sample (including SMOTE) only on training data, and measure f1 on original data distribution (test data). Moreover, if you divide oversample data with train_test_split then you have no control over the distribution of duplicated items for test and train. Which means that you can have the same observation in both test and train, which means you test partially on the training set - that's why the results increase. So first divide into train / test, and then perform operations only on the training set, and the test set should be without any changes.
Still, it's a very good tutorial, it's nice that you share your knowledge !!

magdalenawielobob

Those who are watching just recently, SMOTE function is "fit_resample" now. Also if you can't import imbalanced_learn properly, try restarting the kernel.

tugrulpinar

Thank you so much for sharing this interesting information about data transformation. I was training a neural network that gave an AUC of 0.85, after balancing the class with the SMOTE it reached 0.93 AUC. Obviously, the f1-score and accuracy also improved. Thanks!

stanleypiche

Hey codebasics, love this video series! I think there’s a pretty big mistake in the oversampling though. You upsample, then do train test split. This means that there will be overlapping samples in both train and testing data, so the model will have already have seen some of the data you are testing it on. I think you need to do your train test split then do the upsampling on the train data only.

tjbwhitehea

Thanks for providing us the path and please keep doing the good work and don’t get upset by lesser views you are a true inspiration for all of us.

Rajdeepsharma

Thanks a lot, codebasics for all of your valuable and knowledgeable content

venkatesanr

The way you are introducing the information is very very excellent, thanks for sharing your knowledge and I'm happy to watch your video

odaithalji

Thank you very much for this video. This actually helps in solving real world scenarios.

manansharma

I always learn something new watching your videos. Thank you 🙏🏻

tchintchie

Thank you. Very clear instruction and linked to Ann too, as I've only used with supervised ml.

CarolynPlican

Great presentation! I think I just needed SMOTE for my assignment but I liked how you explained every method.

ybbetter

i was actually doing the churn modeling project and this video popped up! thanks a lot :)

honeyBadger

Hats off to u Dhaval, Loved ur way of teaching and clearing my concepts, thank u so much

shylashreedev

Thank you again Dhaval. I really appreciate your efforts!!

gurkanyesilyurt

thanks for the great content, for the ensemble method could we use a random sample of the majority class (n=minority class length) then we could create more models for the vote

josebordon

Only in this video looks like your patience was out of your control sir....huhaaaa....but still quality content delivery and great explanation....Tks a lot Sir....

yogeshbharadwaj

very helpful, your video makes everything easier, thousand thumbs up for you 👍👍

muhammadhollandi

You answered my question with only 4 minutes. Great! thank you!

GuilhermeOliveira-seth

Thank you so much and appreciate for your work.

farhodkalonov

Hey, great video.
Can you also make one video on how to handle the class overlapping (that too in imbalanced binary classification)??
Thank you

mitalikatoch

Handling imbalanced dataset in machine learning | Deep Learning Tutorial 21 (Tensorflow2.0 & Python)

How to handle imbalanced datasets in Python

Handling imbalanced dataset in machine learning | Deep Learning Tutorial 21 (Tensorflow2.0 & Pyt...

Handling Imbalanced Dataset in Machine Learning: Easy Explanation for Data Science Interviews

Tutorial 45-Handling imbalanced Dataset using python- Part 1

Handling Imbalanced Datasets SMOTE Technique

What Is Balanced And Imbalanced Dataset How to handle imbalanced datasets in ML DM by Mahesh Huddar

Handling Imbalanced Data in machine learning classification (Python) - 1

Live Discussion On Handling Imbalanced Dataset- Machine Learning

Day 7 - Data Transformation & Model Training | MLOPs Production Ready Machine Learning Project

SMOTE (Synthetic Minority Oversampling Technique) for Handling Imbalanced Datasets

4.7. How to Handle imbalanced Dataset | Data Pre-Processing | Machine Learning Course

Handling Imbalanced Dataset | Data Science | Python | Machine Learning

Handling Imbalanced Datasets using Python | Smote, Upsampling and Downsampling | Satyajit Pattnaik

5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data

How to build machine learning models for imbalanced datasets

Handling Imbalanced Data | Oversampling | Undersampling | SMOTE | Machine Learning | Data Science

This is why you should care about unbalanced data .. as a data scientist

Aditya Lahiri: Dealing With Imbalanced Classes in Machine Learning | PyData New York 2019

Class Weights for Handling Imbalanced Datasets

Handling Imbalanced Data in machine learning classification (Python) - 2

Getting grip of handling imbalanced dataset

Dealing with Imbalanced Datasets in ML Classification Problems | DataHour by Damini Dasgupta

Machine Learning with Imbalanced Data - Part 3 (Over-sampling, SMOTE, and Imbalanced-learn)

Wayfair Data Science Explains It All: Handling Imbalanced Data