Encoding Techniques for Machine Learning | Data Preprocessing

preview_player
Показать описание
Categorical Value
Label Encoding
One Hot Encoding
Dummy Encoding
Hash Encoding
Binary Encoding
Data Preprocessing
Data Preparation
Рекомендации по теме
Комментарии
Автор

Easy to understand. That was a great teaching Mam. Do more videos on Data Science and Analysis mam. Thank you.

nextdoortechieaminfirnash
Автор

Thank you maam for this wonderful video.

br
Автор

Hi Suganya 😊
! Thank you so much for this super clear and useful video!

About nominal categorical feature selection (e.g. using Chi2, Cramers’ V, Boruta) and regularization (e.g. using Lasso, Ridge), which encoding should be used ?

My problem is that nominal categorical features used to be encoded using OneHotEncoding, Dummy encoding, etc. However, once they have been encoded, we have ‘new features’. For instance, the column ‘Animal’, with values ‘dog’, ‘cat’ and ‘mouse’… after OneHotEncoding it will be represented with three ‘new columns’ (a column with the name of each animal) ‘col_dog’, ‘col_cat’, ‘col_mouse’, and with values 0 or 1. So, it is not pertinent to apply feature selection on these ‘new columns’, neither regularization, right?

On the other hand with Label encoding we will keep the single original column, but this encoding is for ordinal categorical features right? So not applied in our example dog, cat, mouse since there is no notion of order among them.
So… what should be done in this case?

Thank you very much in advance for your very kind answer!

lourdesmartinez
Автор

# create object of Ordinalencoding
encoder= ce.OrdinalEncoder(cols=['Degree'], return_df=True,
mapping=[{'col':'Degree', 'mapping':{'None':0, 'School':1, 'Diploma':2, 'Bachelors':3, 'Masters':4, 'Phd':5}}])
df = encoder.fit_transform(df)

saitarun
Автор

this is the code for the label encoding but you wrote ordinal encoding code for label encoding


# Import label encoder
from sklearn import preprocessing
# label_encoder object knows how to understand word labels.
label_encoder = preprocessing.LabelEncoder()
# Encode labels in column 'Country'.
df['Country']=
df

saitarun
visit shbcf.ru