Discretize Python Pandas Dataframe Columns into Groups (Feature Engineering/Conditional Columns)

preview_player
Показать описание
Pandas Conditional Columns: How can we create a new pandas column that is conditional based on the values of another column? We will learn here how we can create columns based on conditions in Pandas (filling it in with respective values as needed).

This is an important skill especially to help us learn the basics of how to derive additional columns based on pre-existing columns in our dataframe along with our set of criteria and rules for binning. Please note: this is an example of feature engineering, of creating newer features (e.g. general_age) based on other variables we know about. Please note that Saniya did not intend to hurt or disrespect anyone based on their age; age is truly just a #. These examples here are just for illustration purposes. For instance, from age (continuous variable with many possible values), how do we create 3 groups (young, middle-aged, and old) based on some rules.

In the end, Saniya also shows how we can extend these ideas to create "bins" for categorical data (e.g. smoking status and blood pressure level, for instance).

Please reach out to Saniya with any and all questions. Please subscribe for more updates.

TIME STAMPS
00:00 Discretize Numeric Python Pandas Dataframe Columns Using Grouping Values (Feature Engineering)
03:51 Up the Movie for ages :)
06:24 Goal is to discretize age variable into buckets ("general_age" variable)
08:47 The 3 Categories for Age variable (young, middle-aged, old)
11:11 Discrete versus Continuous Random Variables
23:18 Continuing to explain 3 age categories of interest
24:29 How does the .loc function for dataframes help extract specific rows meeting a condition (e.g. row index #)?
25:25 General idea for the Python code needed
32:35 Binning age for Young ages (age less than or equal to 30 years)
42:17 Binning age for Middle-aged (age greater than 30 and less than or equal to 60 years)
53:45 Binning age for Old ages (age greater than 60)
59:59:52 Example of what students had done (incorrectly before)
01:01:76 Example of applying these rules for categorical data (e.g. smoking status and blood pressure level to define overall health)
Рекомендации по теме
welcome to shbcf.ru