Handling Imbalanced Dataset in Machine Learning: Easy Explanation for Data Science Interviews

preview_player
Показать описание
Imbalanced Data is one of the most common machine learning problems you’ll come across in data science interviews. In this video, I cover what an imbalanced dataset is, what disadvantages it presents, and how to deal with imbalanced data when data contains only 1% of the minority class.

🟢Get all my free data science interview resources

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:

====================
Contents of this video:
====================
00:00 Introduction
01:20 Interview Questions
01:38 Imbalanced Data
03:15 Why it causes problems?
04:27 How to deal with imbalanced data?
08:13 Model-level methods
11:33 Evaluation Metrics
13:25 Outro
Рекомендации по теме
Комментарии
Автор

To my view, imbalance of data does not pose a problem. During classification one ought to model class membership distributions, and these may be small. As long as they are correct, there is no problem. One should, of course, use proper scoring rules (i.e. not accuracy) to maximize the classification problem.

Tetlock's Superforecasting serves as a wonderful and very readable introduction to predicting unbalanced classes.

michaeldarmanis
Автор

Hi Emma, it is a really good summary videos on the matter of imbalanced dataset. Thank you and keep up the good work!

elonchan
Автор

Thanks Emma, these short videos come in handy when preparing for interview

AnkurSingh-mkrc
Автор

This video is amazing. It was easy to understand and summarized different possibilities for dealing with unbalanced data. Congratulations! Keep helping people. I am very grateful for your explanation!

dle
Автор

This video helped me clear an interview. Subscribed. Thank you.

ankgup
Автор

Best Video on ML, I understood very clearly. Thank You

psg
Автор

Thanks Emma, Can we also have a series of videos on deploying ML models in production?

sanyam
Автор

This is really helpful. thank you so much for putting out these videos!

Itsdanielpeng
Автор

Checkout this paper on Gumbel loss/activation for LVIS long tailed dataset, interesting method for imbalanced datasets

thedislikebutton
Автор

Emma, great explanation and to the point.

qrpiowb
Автор

I enjoyed this video. Thanks for this Emma

ayambavictorndoma
Автор

Hi Emma,
these videos are really good.
can you make a video on time series analysis

SonuKumar-gtxs
Автор

Hi Emma. Could you talk about chatGPT (including its model, dataset, algorithms, system design, etc) for the next video? Thank you.

jasonswift
Автор

I have data with class 0: 150 and only two data from class 1.
is there any way to do classification with this data?

faisalmahmud
Автор

Hi! Is there a way you can share this notion document! Thank you!! Great content

ATN_AI
Автор

Hey Emma..big fan of your work😀, looking for series in model deployment.. if you can add things like processing(batch/stream), serving(batch/realtime) and learning(offline/online) part in production. sorry if it is a big ask🥲

sambidpradhan
Автор

In the ‘why imbalance is important’ part, the accuracy for rare event predicting model can be solved by relying on other evaluating metric such as precision and recall, isn’t that right?. It’s not explaining the why

kevinpoisson
Автор

hey Emma please send me the code for imbalanced image datasets

mihretdesta
Автор

Hi, Emma! Thanks for sharing. Very helpful materials. But i got a probleme when downloading the presentation notes, somehow the notes for imbalanced dataset is missing, when I click the imbalanced dataset notes, it actually opens the notes for encoding categorical data, could you please help with this?

Aria-owcl
Автор

You are just reading the text written in the book, try to explain with examples and further in detail, apart from what is already mentioned in the book.

srhrsh