Data Science Fundamentals: Data Cleaning in Python

preview_player
Показать описание
This is the third video in my Data Science Fundamentals series. In it I walk through the most important data cleaning techniques using pandas. Data cleaning is extremely important process in data science. There is an old adage in data science "garbage in garbage out", if we don't provide clean data to our models, we will get poor results. Data cleaning is essential in becoming a great data scientist. This video will show you how to clean data by removing and/or imputing null values, cleaning and standardizing data types, and using graphs to understand anomalies in your data.

#DataScience #DataScienceFundamentals #DataCleaning #Python

Concepts Shown:
1:53 Read in the data
2:55 Understand features of the data set
3:25 Remove duplicates from data set
4:15 Finding columns with null values & finding the % null in each column
6:50 Removing null values
10:00 Imputing null values
12:45 Cleaning text data
15:30 Converting between data types
22:20 Box plots and histograms
25:00 Normalizing outliers
30:30 Feature scaling Min-Max

#KenJee

Partners & Affiliates

MORE DATA SCIENCE CONTENT HERE:

Check These Videos Out Next!

My Playlists
Рекомендации по теме
Комментарии
Автор

This is a good basis to show video. I learned step by step data cleansing method applied by KJ and created my Data Cleansing Framework via it. Thanks man! Awesome! From Faizal, Malaysia

faizalshebli
Автор

These longer coding videos are fire, thanks

JLU
Автор

Thank you, Ken... This is a really good and interesting playlist. I am learning much about the various operations and commands through this playlist.

piyushzope
Автор

great session request from my side is can make a video on the basic of sklearn

shivamkumarsakolia
Автор

Nice content, easy to understand and packed with useful details that I'm going to be able to put to work. Thank you so much for contributing your time in showing us all. Great stuff.

ttovar
Автор

Love it. This is just exactly what i need. Thank you so much !!!

Hope your channel will have many viewers .

hangpham
Автор

Good tutorial Ken, very clear, thx a lot!

carlosroquesuarezgurruchag
Автор

Learned some great cleaning skills on datasets. thanks

kelline
Автор

15:49 - A simplier way to go would be: df.description.apply(lambda x: str(x).lower() )

rodrigokk
Автор

Hi Ken, thanks for making these tutorial videos, they are very detailed, informative and helpful!! Thanks for your efforts. A quick question. In around 21-22minutes, you use : data.cylinders=pd.to_numeric(data.cylinders, errors="coerce")
. I do not really understand what does this function do, can you explain it? Thanks

crystalzhangy
Автор

Hi Ken, great content as usual! Thank you so much it helps me a lot!

felixnicholas
Автор

Liked your video.. again..
Thank u for ur efforts.
Really appreciate it. 👍

ArbazKhan-olsz
Автор

Hey Ken these videos are really great. It's nice that I'm able to follow almost everything you're doing already. One question is, I know these videos are Fundamentals of Data Science, but how much farther does your day to day Data Science job go from here?

AndrewAlarcon
Автор

Hey Ken ! Great video.

I had confusion about the code for scaling the data. Can you explain me how exactly does it work ?

chaitanyamundle
Автор

18:48 - why would you apply tolower again? (Wasn't it already done for the whole data frame?)
19:37 - what's the use of doing data.cylinders = data....

GusMD
Автор

appreciate the detailed step by step tutorial! one concern though, for the cylinder column, would it make sense to fill na with the mean cylinder values?

davidyoo
Автор

thankyou, this is very helpful, also could you make a video on data visualization :)

mehvishmeraj
Автор

Hi Ken, thank you for your great content.
so I am new to data science here, data exploration and data cleaning are where I've stuck for a while now. Is there any book or resources explain things step by step.
One more thing is I've understood your code but I don't get the meaning behind especially
the scaling data.

tuvantran
Автор

Maybe it was an update, but the function .desc didn't work here. I had to type .description

Guilhermeetimao
Автор

#35: Useful if you did your first projects on Data Cleaning, and want feedback on what you did vs. what is common to do. For me, some concepts click! Thanks Ken!!!! #66daysofdata

a_vickyp
join shbcf.ru