Python Tutorial: Transforming categorical variables

Показать описание

---
Now that we know what are the categorical variables in our dataset we can start transforming them into numerical.

To transform a categorical variable into numeric, we have to understand it's type first. There are two types of categorical variables: ordinal and nominal. Ordinal variables have two or more categories that can be ranked or ordered. In our case that is the **salary** column, where the values clearly have a logical order.
The 2nd type is Nominal, where categories do not have any intrnisic or logical order. An example of this kind of variable in our dataset is the column **department**, as its values clearly do not have any order or rank: sales department is not higher than hr or viceversa and so on.
Based on what type of categorical variable you have, there are different methods for transforming them.

For the case of ordinal variables we can encode categories by converting each of them into a respective numeric value. There are 3 steps to accomplish that tasks in Python.
- First, we have to tell Python, that the column salary is actually categorical. This is done using a method called **astype()** which is providing the type of the variable.

The next categorical variable is nominal, as there is no order or rank between departments. This means that encoding approach is not useful anymore. In this case, transformation should be accomplished trough the so called dummy variables.

Dumym variables are the variables that get only two values 0 or 1. Let's say an employee is from the technical department. This means if we have a searate column for each department, then the mentioned employee will have value of 1 in the column for technical and 0 in the columns of all other departments.

This means we will have to create a new dataframe where each department is a separate column and each row is a separate employee with 1s in front of his/her department and 0 in all other places. While the task seems to be confusing, it is very easy from technical perspective due to a very nice function from pandas called **get_dummies()**.

When dealing with dummy variables one should be cautious of a phenomenon known as dummy trap. The latter is the situation when different dummy varaibles convey the same information. In this example, the sample employee is from the technical department, so it is the only column with a value of 1 in the first table. In the 2nd table, the last column is dropped, but we can still understand that the employee is from technical department by looking at all the other departments that have value of 0. For that reason, whenever in similar situations dummies are created one of them can be dropped as its information is already included in others.

Ok, time to put this into practice.

#DataCamp #PythonTutorial #Human #Resources #Analytics #Predicting #Employee #Churn #Python

Рекомендации по теме

Python Tutorial: Transforming categorical variables

Python Tutorial: Transforming categorical variables

Python Tutorial: Dealing with categorical features

SHAP with CatBoostClassifier for Categorical Features | Python Tutorial

How to convert categorical variables into a numeric format in Python?

[Data Analysis with Python] 12. Turning Categorical Variables into Quantitative Variables in Python

Dummy Variables | Get Dummies to transform Categorical Variables into Boolean using python pandas

Lecture 3 : Converting Categorical Features into Numerical Format in Python

Turning categorical variables into quantitative variables in Python - Data Analysis with Python

Handle Categorical features using Python

Python for Machine Learning | Binning with Python | Transforming Numerical to Categorical- P77

Machine Learning Tutorial Python - 6: Dummy Variables & One Hot Encoding

One Hot Encoder with Python Machine Learning (Scikit-Learn)

Feature Engineering Categorical Variables Python

Create Dummy (Categorical) Variables with Pandas in Python (No sklearn)

Convert Categorical Columns to Numerical Columns

How to convert back the categorical features from the numeric features in Python?

How to Convert Categorical Values to Binary (0 and 1) in Python with Pandas)

How to encode categorical variables in Python

Turning categorical variables into quantitative variables in Python #datascience #datascience

87 Getting Your Data Ready Convert Data To Numbers | Scikit-learn Creating Machine Learning Models

01 One hot encoding (Categorical variable encoding - Python code Machine Learning AI)

Transforming Categorical Response Variables in SQL and Python

Handling Categorical Variables using Pandas || Python for Data Science

Step-by-Step M/c Learng with Python : one-Hot Encoding - Convert Categ Features to Num |packtpub.com