One-Hot, Label, Target and K-Fold Target Encoding, Clearly Explained!!!

preview_player
Показать описание
In theory, discrete variables, or features, are easy to use with machine learning algorithms. However, in practice, it's not always so easy and we often have to transform discrete values, like favorite colors, into numbers. There are lots of ways to do this, and this video walks you through 3 of the most popular methods.

English

Spanish

Portuguese

If you'd like to support StatQuest, please consider...
...or...

...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...

...or just donating to StatQuest!

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:

0:00 Awesome song and introduction
1:24 One-Hot Encoding
3:25 Label Encoding
4:39 Target Encoding
6:27 Target Encoding with a Weighted Mean, or Bayesian Target Encoding
9:56 K-Fold Target Encoding

#StatQuest #DubbedWithAloud
Рекомендации по теме
Комментарии
Автор

It's mind-boggling how much better Josh is at explaining complicated topics than anyone else.

matthewmechtly
Автор

This is literally the best explanation about statistics & traditional ML model. I am so lucky to see your video with my journey of data science started.

sungminson
Автор

Thank you so much for your videos, this is by far the best educational Machine Learning channel I’ve ever come across

bridgetelly
Автор

Much better explanation than what I had at class!

Li-dvlr
Автор

Hi Josh I wanted to thank you for your content, I'm finishing your stats playlist it's very good. Statsquatch has become my friend. Big hug straight from Brazil!

LuizHenrique-qrlt
Автор

I really loved your explanation and your sense of humor. I really did!

moazhendy
Автор

Hi Josh. Just came across your channel. Your method of explaining is so concise, clear and appealing. Definitely I would learn a lot from this channel.

myfoodfeast
Автор

thank you your explanations are always simple and clear.

amirrezaabedini
Автор

Totally great explanation, congratulations

MilenkoCurcin
Автор

Love the dry humour in your videos 🤣. Great content too!

marcom
Автор

Hey Josh, great job. Thnak you a lot!

hasandaaboul
Автор

How do you use k-fold target encoding for a test data set, since blue now has several distinct numeric values as a predictor in the training set?

monkeystoot
Автор

but what happens in inferring? say you trained a great model and now you are predicting the new data, do you use the mean of the old data or the mean of the new data? if you use target encoding, well in the new data you don't have a target? so what now?!?

SnipeSniperNEW
Автор

Great video. 👍🏽
I find it less confusing, however, to say categorical or qualitative data instead of discrete data.
Numeric data can be discrete (integers)

lbognini
Автор

Hey Josh,

Love the videos. I'm left with one question: is there anything we can do when we are doing multiclass classification and need to transform our predicted variable so that the algorithm isn't working with string data?

bkleinman
Автор

Great work like always! What to do, when target encoding results in the same number for two labels?

lutzsommer
Автор

Hi Josh, a heartfull Thank you for sharing these encoding techniques

I have one doubt; it may look stupid, but I just want to clarify it with you.
On 13:40, the encoding of green colour with target value 1 is 0.42, and below that, green colour with target value 1 is 0.67.
So when encoding transforms the new data, will system change the green colour to 0.42 or 0.67?

lakshmanbharath
Автор

Hi Josh, thanks for explanation. But I want to know how to transform the unseen data using k-fold target encoding? Is it oke to use mean value of the transformed category? Thanks before

dvergnordicalfar
Автор

thanks for a great video! i am trying to apply k-fold target encoding on my train and test data. i target encoded my train data using k-fold target encoding just like the video, but how should i encode my test data ? If the feature is BLUE, should i get the mean of BLUE (target encoded) in the train data and use it for test data? OR should i just use the whole train data to get new target encoding values for the test data?

speedtent
Автор

Hi Josh! Great video. Are you planning to add to these videos how to apply them in Python?
Thanks!

lautarocisterna