Kaggle's 30 Days Of ML (Competition Part-3): What is Target Encoding and how does it work?

preview_player
Показать описание
This video is a walkthrough of Kaggle's #30DaysOfML. In this video, I will what target encoding is. We will implement #TargetEncoding for our dataset.

Note: this video is not sponsored by #Kaggle!

Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)

Follow me on:
Рекомендации по теме
Комментарии
Автор

Abhishek bhai, please don't stop making videos. I followed them very regularly eventhough I was very tired after my working hours. I have put a lot of dedication and pretty sure there are many others around the world who are doing the same. So please continue making videos for us. Until now, you have done an outstanding job!!

geekyprogrammer
Автор

Appreciating your videos and just got your book yesterday.

Completely unrelated question, what software do you use to make your videos with the camera overlay?

JeremyWhittakerAZ
Автор

Hi Abhishek
Thanks for the video. Quick question, how would target encoding work in real life since we don't really have target variables in test dataset in real life

ram
Автор

Hi Abhishek,
Why at all we need target encoding? What are the benefits? can you tell briefly. Thanks

madhuful
Автор

Bhaiya, would it be possible for you to create a playlist/course on Kaggle competition-specific machine learning and deep learning? Just a request! 😅

debarchanbasu
Автор

Just to correct a mistake of yours, Overfitting can be seen when model perform well on training set but doesnt on validation set, validation and testing set are both unseen for the model so there is no way one model perform well on validation but bad on test, if its happening there is probably data leakage .

samirkhan
Автор

Thanks Abhishek for your Great Support Regards.

vikasmishra
Автор

Abhishek Bhai, confuse Kar diya aapne. Out of all encoding methods I used the one which was counting the frequency and I think that is what is missing from your notebook. please confirm
for col in cat_col:
train[f"cont_{col}"] =
test[f"cont_{col}"] =

So as per my understanding, this should have been done on folds. ie xtrain, xvalid and xtest.
What we saw yesterday was frequency encoding. It is a way to utilize the frequency of the categories as labels. But what you showed us Today is Target encoding. Frequency encoding can also be used for categorical variables. Is my understanding correct.?

yogitad
Автор

One small doubt: Is soo much generalization required though? Like, we encode every categorical variable in the training set(df), by assigning a fold number to the records and then finding its target encoding for a fold using all the other folds of which it is not a part right. And furthermore, we take average of all these target encodings that we got for each fold, in order to encode the same categorical column in test set right. My question is, why are we encoding the test set now itself? We mainly use folds to fine tune the parameter and ensure we have the correct model right. Once the model is set and parameters are found, we could directly encode the test set using the entire training set instead of taking average of folds, which by themselves are generalized as they contain encodings from other folds right?

vigneshbalasubramanian
Автор

Hello. In one of the lessons it was written that after applying OrdinalEncoder there can be some problems, for instance in the case that the validation data contains values that don't also appear in the training data, the encoder will throw an error, because these values won't have an integer assigned to them.
How did you deal with it ? I see that you just used encoder, without removing bad columns

mikayilshahtakhtinski
Автор

Thanks for the new concept, I have a doubt.
When we calculated grouped mean on x_train, then why we mapped it over x_valid ?

swayamsingh
Автор

Sir, can you explain target encoding again, or point towards some resource that explains it, cause the explanation was chopped between two videos, and I couldn't understand target encoding from this video

iwrestling
Автор

how do you have a dark Kaggle notebook, my eyes are burned in the night?

Shubhamkumar-svty
Автор

Sir, I have a doubt, if I do the folds split at the end of the whole process it's wrong? I mean, first I do all the feature creation and selection process, then later with the dataset "ready" to train, I do the folds split with the encoding or scaling.

maxidiazbattan
Автор

Why we do target encoding based on categorical columns? What if i implement "OrdinalEncoder" for all df at very beginning, before folds, can I still use cat columns for target encoding if cat columns now have numerical values instead of text?

igordedkov