This is why you should care about unbalanced data .. as a data scientist

preview_player
Показать описание
What do you do when your data has lots more negative examples than positive ones?

Рекомендации по теме
Комментарии
Автор

We just talked about this in my machine learning course this week!! Great timing! This video is very helpful.

jessibenzel
Автор

Great content, these practical content is gold. Thank you :)

haneulkim
Автор

ritvikmath coming with a video of one of my favorite topics - instant like!

pgbpro
Автор

Hi! Great video. Is there any way you would like to creat a full in-depth catboost tutorial on some random data? Would be super useful.

igorbreeze
Автор

very interesting. AdTech modeling of conversions as caused by advertising always suffers from imbalance. (Conversion rates are usually low-mid single digits).

joelrubinson
Автор

Great video. For other ML algorithms like logistic regression, SVM, KNN etc, can we implement the first method (upweight the minority class) ? or this is only applicable to decision tree ?

davidzhang
Автор

Excellent video!

One question though: are certain classification models immune from class imbalance? Thanks!

bmebri
Автор

Okey, but with oversampling - how do you use cross validation ? Because if you use it on the oversampled dataset, you'll have dataleak

danielwiczew
Автор

Can we customise loss function? For example more weight for misclassification of true minor class and less weight for the other error?

Sameerahmed
Автор

Great video!

But don’t you think with such unbalanced dataset it would be better going for an anomaly detection algorithm instead of classification algorithm?

d.a.k.o.s
Автор

Great demo!
just one thought, why did you not talk about downsampling the majority class? and see what can be the impact?

zahrashekarchi
Автор

It should be "imbalanced data" instead of "unbalanced data"

chenxiaodu
Автор

Hi just wondering if SMOTE is applicable for image data? I saw only one article on it online, so I am not sure if it even works since generating synthetic images is likely much harder.

mrirror
Автор

You could predict that aircraft engines NEVER fail and almost always be right.

bernardfinucane
Автор

Are you familiar with Latent vectors in network analysis?

s/o from South Africa

Septumsempra
Автор

hi
when people have problems with unbalanced data, it's just the proof they did not get what they do
when i was young ( a long time ago, so), our teachers wanted us to do things ' step by step' to be ( nearly) sure we knew what we were calculating
as it's not the case anymore, yes, people dont get the methodology and the maths, but practice data science, wich is sad

junkbingo