Why NEVER use pandas' get dummies for creating dummy variables | Machine Learning

preview_player
Показать описание
In this tutorial, we'll go over the concept of pandas' offering of creating dummy variables for encoding categorical columns.

I've uploaded all the relevant code and datasets used here (and all other tutorials for that matter) on my github page which is accessible here:

Link:

If you like my content, please do not forget to upvote this video and subscribe to my channel.

If you have any qualms regarding any of the content here, please feel free to comment below and I'll be happy to assist you in whatever capacity possible.

Thank you!
Рекомендации по теме
Комментарии
Автор

But, I was taught that cleaning and encoding should be done before splitting and that scaling shoud be performed . Is it wrong?

akashkunwar
Автор

Hi Rachit - why not use get dummies with the entire data (before splitting into train/test)? Wouldn't it solve the potential problem? Thanks!

joeyk
Автор

Hi,
Rachit Toshniwal, what a good point.

See, to overcome that, after applying get.dummies you have to align the dateframe. If you do that, then you can run any machine learning model and it will be ok. Please see an example below:

X_train = pd.get_dummies(X_train)
X_valid = pd.get_dummies(X_valid)
X_test = pd.get_dummies(X_test)



X_train, X_valid = X_train.align(X_valid, join='left', axis=1)
X_train, X_test = X_train.align(X_test, join='left', axis=1)

devpython
Автор

learned something new today. Thank you so much

KA_
Автор

why pd.get_dummies not working for me ?

wtfashokjr
Автор

but you can do getdummies before traintestsplit

venkyvenky
Автор

Hi Rachit, so what to do to encode categorical variables avoiding mismatch? I'm working on a large dataset 8before the splitting) and I already missed some categories.

eleonoraocello
Автор

I was doing wrong for whole time. you saved

atiaspire
Автор

Namaste and Thank you. Your videos are very helpful

jeweltilak
Автор

What to use when dataset column has multiple categorical values??? ( like 200 )

sandeshkharat
Автор

This video is wrong and I advise you all to ignore it.

BiologyIsHot