filmov
tv
Data Science Project from Scratch - Part 3 (Data Cleaning)
![preview_player](https://i.ytimg.com/vi/fhi4dOhmW-g/maxresdefault.jpg)
Показать описание
This is part 3 of the Data Science Project from Scratch Series. In this video I go through how to clean up your data to make it usable for exploratory data analysis (EDA) and model building.
Data cleaning is an extremely important and often overlooked step in the data science lifecycle. Python has some handy functions that allow you to parse and replace data relatively easily. You can also use regular expressions to do this; however, those are a bit beyond the scope of this video. I mostly use lambda functions because I think that this is the simplest approach.
The first thing that we clean is the data science salary. We need to make sure that it is numeric because we are using that as our dependent variable. We also want to go through and do some light feature engineering. We can get some info about the state of the job postings and the nature of the job postings themselves.
I went through and looked to see if the postings had python, r-studio, spark, aws, or excel listed and added those as features.
Again, this is an iterative approach that is rather messy. Please stay tuned for part 4 of the series EDA!
#DataScience #KenJee #DataScienceProject
Partners & Affiliates
MORE DATA SCIENCE CONTENT HERE:
Check These Videos Out Next!
My Playlists
Data cleaning is an extremely important and often overlooked step in the data science lifecycle. Python has some handy functions that allow you to parse and replace data relatively easily. You can also use regular expressions to do this; however, those are a bit beyond the scope of this video. I mostly use lambda functions because I think that this is the simplest approach.
The first thing that we clean is the data science salary. We need to make sure that it is numeric because we are using that as our dependent variable. We also want to go through and do some light feature engineering. We can get some info about the state of the job postings and the nature of the job postings themselves.
I went through and looked to see if the postings had python, r-studio, spark, aws, or excel listed and added those as features.
Again, this is an iterative approach that is rather messy. Please stay tuned for part 4 of the series EDA!
#DataScience #KenJee #DataScienceProject
Partners & Affiliates
MORE DATA SCIENCE CONTENT HERE:
Check These Videos Out Next!
My Playlists
Комментарии