Data Science Project from Scratch - Part 3 (Data Cleaning)

preview_player
Показать описание
This is part 3 of the Data Science Project from Scratch Series. In this video I go through how to clean up your data to make it usable for exploratory data analysis (EDA) and model building.

Data cleaning is an extremely important and often overlooked step in the data science lifecycle. Python has some handy functions that allow you to parse and replace data relatively easily. You can also use regular expressions to do this; however, those are a bit beyond the scope of this video. I mostly use lambda functions because I think that this is the simplest approach.

The first thing that we clean is the data science salary. We need to make sure that it is numeric because we are using that as our dependent variable. We also want to go through and do some light feature engineering. We can get some info about the state of the job postings and the nature of the job postings themselves.

I went through and looked to see if the postings had python, r-studio, spark, aws, or excel listed and added those as features.

Again, this is an iterative approach that is rather messy. Please stay tuned for part 4 of the series EDA!

#DataScience #KenJee #DataScienceProject

Partners & Affiliates

MORE DATA SCIENCE CONTENT HERE:

Check These Videos Out Next!

My Playlists
Рекомендации по теме
Комментарии
Автор

I really like that you don't edit things out. I think the process is much more informational than just the result. This is a good mini-series man, keep it up!

Gyninku
Автор

I used to Hate Lambda function, but really I got good understanding from this Video,
Thank you so Much

samehsayed
Автор

I really shouldn’t have laughed at 30:16 when Ken’s like “yea sorry about the sirens. You know, tough times out there”. But it’s nice comical relief when you’re slogging through code like that. Ken you do a GREAT job at explaining the process end to end! I love this mini-series.

christianscodecorner
Автор

Was nice to see near the end of the video that even you too had to look something up! Further backed up your video saying that no real data scientist memorizes everything. We appreciate you for all that your doing!

Mario-oxdm
Автор

This is outstanding content. Watching this series before I have started on a project will save me so much time.

gwbraders
Автор

I am absolutely LOVING this series! I've been studying for a while and always wondered how an actual data scientist would use all these tools on an actual project. Great work Ken! Just found your channel today and I have a feeling I'll go through your videos really fast

rayneto
Автор

This series is exactly what I wanted and I highly recommend to anyone who is entering in Data Science.

chirag
Автор

Excellent work, Ken. Your 'soup to nuts' approach is extremely helpful to those like me who are brand new to data science. On top of everything, I like how you show us your use of Git and GitHub as well as all of the lambda functions. Keep it up!

MichaelCruz-rchb
Автор

Things every Data science beginner needed.
Thanks Man. Keep it up :)

salikmalik
Автор

First went through a couple of minutes of the video where you discussed what had to be done and started solving it by my own. Once I was done, came back here to check out how you'd solved. This was immensely helpful as my code was not as efficient, and learnt better approaches to the same problem! Thanks Ken!

ashikka
Автор

I love your commitment towards teaching
replying and liking each and every comment is not so easy
but that's what making you special
keep growing and keep sharing knowledge
I hope one day you will reach your expectations
thank you alooootttt

karthikc
Автор

Data cleaning process is a very important step and can be very tough at times depending on how messy our dataset is. Thanks for the detailed video Ken!

importdata
Автор

Hi Ken, I've been following your work quite some time now. The way you keep your online presence is inspirational. I would love it if you have more step-by-step project videos on Youtube. There are so many areas in which I don't even know if analytics are applied. Sport analytics is interesting for me for example. Also, projects with more practical implications like the regular churn model or HR analytics would add value to your channel, too. I would enjoy watching them on my end at least.

turquoisetravels
Автор

Hi Ken... This is really very informative, for me as a newbie in DS. Glad to see that as comfortable as you are using the lambda instead of RegEx with this one to clean up data, yet had to google how to drop a column. :) I thought that was very cool that you didn't edit it out and makes me feel better. Very encouraging... keep up! Thanks a lot for sharing.

limeyboo
Автор

I still don't have enough experience in using python but am amazed in all the "magic' that it can do! Great job, Ken! You're amazing! :)

miguelrosales
Автор

Alternate title: "Part 3 (Lots of lambda functions)" haha! Great video Ken, really enjoyed the on-the-spot feature engineering

Sambungus
Автор

You have ended the hate I have had for Lambda functions (just because I couldn't understand them) in the first 20 mins of the video. Thank you!

joseenrique
Автор

I'm a DS Major and this is very helpful. I can see your channel blowing up when all the software engineers are switching over to DS! lol

Itsdanielpeng
Автор

This is just amazing Ken ! now I know that's the kind of job I wanna do for a living :) thank you so much for sharing !

elyazidassade
Автор

A good series explaining the data science process which a lot of videos and articles ignore. This video in particular would be the time to mention using existing and/or creating a codebook for the dataset.

xA