Pandas Get Dummies | pd.get_dummies()

preview_player
Показать описание

When you're doing machine learning you'll work with algorithms that cannot process categorical variables. In this case, you need to turn your column of labels (Ex: ['cat', 'dog', 'bird', 'cat']) into separate columns of 0s and 1s. This is called getting dummies pandas columns.

This function is heavily used within machine learning algorithms. For instance, random forrest doesn't do great with columns that have labels. It's best to turn these into dummy indicator columns.
Рекомендации по теме
Комментарии
Автор

Well, kinda late to this party, by a couple of years, but VERY good, clear, explicit explanation with examples.
If I could put this all in bold to emphasize, I might... Thx for the short but deep piece, this is the missing spark some of us do not get in a timely manner.

gudguya
Автор

Hey, I read about the 'dummy variable trap', and that dropfirst should help counter this! Anyway thanks a lot, great video

svishal
Автор

I see this channel blowing up if you keep it up!

omarmiah
Автор

The dropfirst parameter reduces the no of data be interpreted without loosing the existing significance of the data. So I guess it makes the data interpretation process more concise. Tell me if I'm getting it wrong :)

BlissOn
Автор

In real world project we may have >20 categorical feature, so do we need to mention all column names in function?

ajaykushwaha-jemw
Автор

beautifully explained man, great work, thnx for making this video

pranavgoyal
Автор

Brilliant explanation, short and concise!

azuremis
Автор

If we have ex-Age categories such as Young, Adult, Old for this whether we have to go for either ordinal or nominal encoding pls answer

shaikhkashif
Автор

I have. a question I run my pd.get_dummies() method and my categorical data remains categorical and does not convert into numerical values. My categorical vales are true or false statements

anthonymalary
Автор

we use drop_first=True to prevent duplicated information when doing deep learning.

parsiabolouki
Автор

very well explained ..!!!
thank you :)

ashishchandra
Автор

What if I have multi-lable target that represented as the list of items (not one mutual exclusive class). how to use get_dummies then?

Автор

FYI - Reason for to use drop_first() = Avoid multicollinearity issues

harryshambaugh
Автор

When i run get dummies in my dataframe for gender, it returns true/ false instead of numbers - why? Can anyone explain?

emeraldpopcorniac
Автор

seems like the drop_first is used for binary categories. if you have a gender category and the options are Male and Female, it will create two dummies one for Male and second for Female. It is repetitive because if Male is 0 then 100% female is 1. These columns can be combined into 1. so a single female column with 0 s and 1 s, 0 meaning they are male, 1 meaning they are female. Thats my assumption.

JackFarah-vy
Автор

nice video, but not the info i was seeking =(

daniloyukihara