How to deal with Imbalanced Datasets in PyTorch - Weighted Random Sampler Tutorial

Показать описание

In this video we take a look at how to solve the super common problem of having an imbalanced or skewed dataset, specifically we look at two methods namely oversampling and class weighting and how to do them both in PyTorch.

Toy dataset used in video:

❤️ Support the channel ❤️

Paid Courses I recommend for learning (affiliate links, no extra cost for you):

✨ Free Resources that are great:

💻 My Deep Learning Setup and Recording Setup:

GitHub Repository:

✅ One-Time Donations:

▶️ You Can Connect with me on:

Рекомендации по теме

Комментарии

A tip that I didn't mention in the video is when you're iterating through the dataset to create the sample weights is to iterate through dataset.imgs rather than just dataset. This will run much faster because we are not resizing, performing transformations and so on which we do not need to do when we are only interested in the labels of the examples.

AladdinPersson

the pytorch weighted random sampler is an amazing pytorch feature. Thanks for talking about it here.

imveryhungry

You sir, deserve way more subscribers for the consistency and the diversity of the topics you choose. Keep up the good work.

sahasamanecheppali

To tackle the imbalance one can also use the Focal Loss function. The kornia library has it.

mochametmachmout

Great video！A small bug: The orders of traversal between os.walk and datasets.ImageFolder are different. In the github code, We cannot guarantee that the smaller number of samples will get a greater sampling weight.

jacoblee

Awesome video bro, this has been really helpful. I'd like to share one trick of mine: Applying more random and strong data augmentation to the examples that are limited, and less random augmentation on examples that are quiet enough, and then making sure that each batch to the model receives an equal number of examples for each unique label. The only side effect here would be that you'd have to write your own custom dataloader that does that 🥱, and to be honest, it's not easy 😂, but once you set it up then its just a matter of copying and pasting for next projects :)

Thanks again for the video.

wolfisraging

Thanks Aladdin for all your videos, they are really awesome and informative

kirankharel

Thanks Aladdin! Would you mind recommend us your learning resources? I mean most of your teaching content are pretty rare in any ML/DL book.

pleomax

Thank you so much for your great content! I was wondering if the loader in the video is the train_loader? do you also apply oversampling on dev_train and test dataloader?

rosacanina

Awesome video and this channel is so underrated in DL community. I would like to know if there will be paper implementation tutorials in the future ?

thantyarzarhein

Dude you are awesome. I like your pytorch tutorials. But would love if you could use google colab for next ones.

ZulkaifAhmed

Superb!!
I implemented AugMix DataAugmentation myself to increase the minority label's samples. My question is should we stick around one data augmentation technique which is state of the art OR we should try all of the others technique?
Thanks

sahil-

Hello, How can we apply under-sampling on your case? Oversampling and undersampling can be applied together?

Thank you

rafaelmahammadli

Another amazing tutorial! I wonder if you have a patreon page where we can support you, it is well deserved. I have a question as well. I have a large imbalanced dataset. I need to call getloader function to get train and test loaders for hyperparamter tuning. Scanning the entire trainset in each function call makes the code slower. Do you suggest a work around? Thank you!

erdi

Hi, how can we use the WeightedRandomSampler for Object Detection task?

yashrunwal

Aladdin, you deserve more subscribers. And you need to charge more :) Just joined as a member.

nishantyadav

Does we have to call get data loader for once for train and once for test set or just a single time for the entire dataset?

Zulle

why did you multiply the sample_weights (which is zero) by len(dataset) ?

fasolya

What is the editor or Ide you are using

mariaanson

Randaugment is one of the best augmentation method, it will improve your model performance and I was using weight normalization like this
nSamples = [346, 168, 106] # class samples
normedWeights = [1 - (x / sum(nSamples)) for x in nSamples]
normedWeights =
print(normedWeights)

is your second method different than this

mustafabuyuk

How to deal with Imbalanced Datasets in PyTorch - Weighted Random Sampler Tutorial

How to handle imbalanced datasets in Python

Handling Imbalanced Dataset in Machine Learning: Easy Explanation for Data Science Interviews

Handling imbalanced dataset in machine learning | Deep Learning Tutorial 21 (Tensorflow2.0 & Pyt...

Tutorial 45-Handling imbalanced Dataset using python- Part 1

4 Ways to Fix ANY Muscle Imbalance (science-based)

How To Fix ANY Muscle Imbalance (3 SIMPLE STEPS!)

5 ways to work with imbalanced data | Imbalanced dataset machine learning | Imbalanced data

What Is Balanced And Imbalanced Dataset How to handle imbalanced datasets in ML DM by Mahesh Huddar

Data Cleaning for Data Analysis with Python using Pandas

How to deal with Imbalanced Datasets in PyTorch - Weighted Random Sampler Tutorial

HOW TO DEAL WITH IMBALANCED DATA ? | Classification | Machine Learning

Dealing with Imbalanced Datasets in ML Classification Problems | DataHour by Damini Dasgupta

SMOTE (Synthetic Minority Oversampling Technique) for Handling Imbalanced Datasets

How to deal with imbalanced dataset

Tutorial 44-Balanced vs Imbalanced Dataset and how to handle Imbalanced Dataset

This is the EASY way to FIX IMBALANCED Machine Learning DATASETS #shorts

Machine Learning Classification How to Deal with Imbalanced Data ❌ Practical ML Project with Python...

4.7. How to Handle imbalanced Dataset | Data Pre-Processing | Machine Learning Course

How to deal with Imbalanced data?

Handling Imbalanced Datasets SMOTE Technique

Handling Imbalanced Dataset | Data Science | Python | Machine Learning

Aditya Lahiri: Dealing With Imbalanced Classes in Machine Learning | PyData New York 2019

Balance Problems - Causes, treatments, tips and more

Class Weights for Handling Imbalanced Datasets