Kaggle's 30 Days Of ML (Competition Part-3): What is Target Encoding and how does it work?

Показать описание

This video is a walkthrough of Kaggle's #30DaysOfML. In this video, I will what target encoding is. We will implement #TargetEncoding for our dataset.

Note: this video is not sponsored by #Kaggle!

Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :)

Follow me on:

Рекомендации по теме

Комментарии

Abhishek bhai, please don't stop making videos. I followed them very regularly eventhough I was very tired after my working hours. I have put a lot of dedication and pretty sure there are many others around the world who are doing the same. So please continue making videos for us. Until now, you have done an outstanding job!!

geekyprogrammer

Appreciating your videos and just got your book yesterday.

Completely unrelated question, what software do you use to make your videos with the camera overlay?

JeremyWhittakerAZ

Hi Abhishek
Thanks for the video. Quick question, how would target encoding work in real life since we don't really have target variables in test dataset in real life

ram

Hi Abhishek,
Why at all we need target encoding? What are the benefits? can you tell briefly. Thanks

madhuful

Bhaiya, would it be possible for you to create a playlist/course on Kaggle competition-specific machine learning and deep learning? Just a request! 😅

debarchanbasu

Just to correct a mistake of yours, Overfitting can be seen when model perform well on training set but doesnt on validation set, validation and testing set are both unseen for the model so there is no way one model perform well on validation but bad on test, if its happening there is probably data leakage .

samirkhan

Thanks Abhishek for your Great Support Regards.

vikasmishra

Abhishek Bhai, confuse Kar diya aapne. Out of all encoding methods I used the one which was counting the frequency and I think that is what is missing from your notebook. please confirm
for col in cat_col:
train[f"cont_{col}"] =
test[f"cont_{col}"] =

So as per my understanding, this should have been done on folds. ie xtrain, xvalid and xtest.
What we saw yesterday was frequency encoding. It is a way to utilize the frequency of the categories as labels. But what you showed us Today is Target encoding. Frequency encoding can also be used for categorical variables. Is my understanding correct.?

yogitad

One small doubt: Is soo much generalization required though? Like, we encode every categorical variable in the training set(df), by assigning a fold number to the records and then finding its target encoding for a fold using all the other folds of which it is not a part right. And furthermore, we take average of all these target encodings that we got for each fold, in order to encode the same categorical column in test set right. My question is, why are we encoding the test set now itself? We mainly use folds to fine tune the parameter and ensure we have the correct model right. Once the model is set and parameters are found, we could directly encode the test set using the entire training set instead of taking average of folds, which by themselves are generalized as they contain encodings from other folds right?

vigneshbalasubramanian

Hello. In one of the lessons it was written that after applying OrdinalEncoder there can be some problems, for instance in the case that the validation data contains values that don't also appear in the training data, the encoder will throw an error, because these values won't have an integer assigned to them.
How did you deal with it ? I see that you just used encoder, without removing bad columns

mikayilshahtakhtinski

Thanks for the new concept, I have a doubt.
When we calculated grouped mean on x_train, then why we mapped it over x_valid ?

swayamsingh

Sir, can you explain target encoding again, or point towards some resource that explains it, cause the explanation was chopped between two videos, and I couldn't understand target encoding from this video

iwrestling

how do you have a dark Kaggle notebook, my eyes are burned in the night?

Shubhamkumar-svty

Sir, I have a doubt, if I do the folds split at the end of the whole process it's wrong? I mean, first I do all the feature creation and selection process, then later with the dataset "ready" to train, I do the folds split with the encoding or scaling.

maxidiazbattan

Why we do target encoding based on categorical columns? What if i implement "OrdinalEncoder" for all df at very beginning, before folds, can I still use cat columns for target encoding if cat columns now have numerical values instead of text?

igordedkov

Kaggle's 30 Days Of ML (Competition Part-3): What is Target Encoding and how does it work?

Kaggle's 30 Days Of ML (Day-1): Getting Started With Kaggle

Kaggle 30 Days of ML - Day 3 - Python Functions & Help - Learn Python ML in 30 Days

Kaggle's 30 Days Of ML (Competition Part-1): Cross Validation & First Submission on Kaggle

Kaggle 30 Days of ML - Day 6 - Python String, Dictionary - Learn Python ML in 30 Days

Kaggle 30 Days of ML - Day 4 - Python Boolean, Conditionals, IF-ELSE - Learn Python ML in 30 Days

Kaggle 30 Days of ML - Day 12 - Missing Values

Kaggle 30 Days of ML - Day 9 - Build first ML Model, Validation - Learn Python ML in 30 Days

🔴 Kaggle Python Machine Learning Challenge Day 1

Kaggle's 30 Days Of ML (Competition Part-2): Feature Engineering (Categorical & Numerical V...

[30 days of ML] Dia 1 | Começando a jornada de Machine Learning no Kaggle

Kaggle's 30 Days Of ML (Day-9): First Machine Learning Model and Validation

Kaggle 30 Days of ML (Day 13) - Scikit-Learn Pipeline, CrossValidation - Learn Python ML in 30 Days

30 Days Machine Learning Challenge by Google Kaggle | Google Certified Free Courses With Certificate

Kaggle 30 Days of ML (Day 15) - Interpretable Machine Learning Use-cases

How I’d learn ML in 2024 (if I could start over)

Kaggle's 30 Days Of ML (Day-13 Part-1): Scikit-Learn Pipelines

Kaggle's 30 Days Of ML (Day-8): What is a machine learning model and what is pandas all about?

Kaggle 30 Days of ML - Day 5 - Python Loops, List Comprehension - Learn Python ML in 30 Days

Machine Learning is Probably Not a Good Career for You

Kaggle's 30 Days Of ML (Competition Part-3): What is Target Encoding and how does it work?

Kaggle's 30 Days Of ML (Day-5, Part-2): Python Loops and List Comprehension

Kaggle's 30 Days Of ML (Day-2): Say Hello to Python: variables, types and arithmetic operations

Kaggle 30 Days of ML (Day 19) - Understanding SHAP Summary Plot - Interpretable Machine Learning

Kaggle 30 Days of ML - Day 6 - Python Strings and Dictionaries