Handling imbalanced dataset in machine learning | Deep Learning Tutorial 21 (Tensorflow2.0 & Python)

preview_player
Показать описание
Credit card fraud detection, cancer prediction, customer churn prediction are some of the examples where you might get an imbalanced dataset. Training a model on imbalanced dataset requires making certain adjustments otherwise the model will not perform as per your expectations. In this video I am discussing various techniques to handle imbalanced dataset in machine learning. I also have a python code that demonstrates these different techniques. In the end there is an exercise for you to solve along with a solution link.

#imbalanceddataset #imbalanceddatasetinmachinelearning #smotetechnique #deeplearning #imbalanceddatamachinelearning

Topics
00:00 Overview
00:01 Handle imbalance using under sampling
02:05 Oversampling (blind copy)
02:35 Oversampling (SMOTE)
03:00 Ensemble
03:39 Focal loss
04:47 Python coding starts
07:56 Code - undersamping
14:31 Code - oversampling (blind copy)
19:47 Code - oversampling (SMOTE)
24:26 Code - Ensemble
35:48 Exercise

#️⃣ Social Media #️⃣

DISCLAIMER: All opinions expressed in this video are of my own and not that of my employers'.
Рекомендации по теме
Комментарии
Автор

Hi. You should perform under / over sample (including SMOTE) only on training data, and measure f1 on original data distribution (test data). Moreover, if you divide oversample data with train_test_split then you have no control over the distribution of duplicated items for test and train. Which means that you can have the same observation in both test and train, which means you test partially on the training set - that's why the results increase. So first divide into train / test, and then perform operations only on the training set, and the test set should be without any changes.
Still, it's a very good tutorial, it's nice that you share your knowledge !!

magdalenawielobob
Автор

Those who are watching just recently, SMOTE function is "fit_resample" now. Also if you can't import imbalanced_learn properly, try restarting the kernel.

tugrulpinar
Автор

Thank you so much for sharing this interesting information about data transformation. I was training a neural network that gave an AUC of 0.85, after balancing the class with the SMOTE it reached 0.93 AUC. Obviously, the f1-score and accuracy also improved. Thanks!

stanleypiche
Автор

Hey codebasics, love this video series! I think there’s a pretty big mistake in the oversampling though. You upsample, then do train test split. This means that there will be overlapping samples in both train and testing data, so the model will have already have seen some of the data you are testing it on. I think you need to do your train test split then do the upsampling on the train data only.

tjbwhitehea
Автор

Thanks for providing us the path and please keep doing the good work and don’t get upset by lesser views you are a true inspiration for all of us.

Rajdeepsharma
Автор

Thanks a lot, codebasics for all of your valuable and knowledgeable content

venkatesanr
Автор

The way you are introducing the information is very very excellent, thanks for sharing your knowledge and I'm happy to watch your video

odaithalji
Автор

Thank you very much for this video. This actually helps in solving real world scenarios.

manansharma
Автор

I always learn something new watching your videos. Thank you 🙏🏻

tchintchie
Автор

Thank you. Very clear instruction and linked to Ann too, as I've only used with supervised ml.

CarolynPlican
Автор

Great presentation! I think I just needed SMOTE for my assignment but I liked how you explained every method.

ybbetter
Автор

i was actually doing the churn modeling project and this video popped up! thanks a lot :)

honeyBadger
Автор

Hats off to u Dhaval, Loved ur way of teaching and clearing my concepts, thank u so much

shylashreedev
Автор

Thank you again Dhaval. I really appreciate your efforts!!

gurkanyesilyurt
Автор

thanks for the great content, for the ensemble method could we use a random sample of the majority class (n=minority class length) then we could create more models for the vote

josebordon
Автор

Only in this video looks like your patience was out of your control sir....huhaaaa....but still quality content delivery and great explanation....Tks a lot Sir....

yogeshbharadwaj
Автор

very helpful, your video makes everything easier, thousand thumbs up for you 👍👍

muhammadhollandi
Автор

You answered my question with only 4 minutes. Great! thank you!

GuilhermeOliveira-seth
Автор

Thank you so much and appreciate for your work.

farhodkalonov
Автор

Hey, great video.
Can you also make one video on how to handle the class overlapping (that too in imbalanced binary classification)??
Thank you

mitalikatoch