Clean MESSY String Data in Pandas

preview_player
Показать описание
Coding in #python and #pandas you can easily clean messy string columns with some built in methods. Here we show an example of cleaning address values to make them more standardized. #datascience
Рекомендации по теме
Комментарии
Автор

I love pandas for cleaning up problematic data sets before feeding them into a model. It's just so dang satisfying to see messy data turn into nice consistent points on a scatter plot :D

ErulianADRaghath
Автор

I would omit the " " in str.split (it defaults to split on all whitespace). Though it doesn't matter that much if you do n=1, if you have messy data, chances are there are double spaces, too, which may give you empty strings that may cause issues down the line.

Fubbel
Автор

This content is exactly what I needed. Thank you!

kmateti
Автор

Nice! I would also recommend using chaining to make it a bit more readable

UnholyRenton
Автор

I just beggin in this world, this is very helpeful for me,
Thanks

daironperezfrias
Автор

Nice Video 👍I don't know how to code. But I can relate this to MS excel..

rahul
Автор

What the $ make in strings since regex is set to false ?

Fine_Mouche
Автор

That data looks so clean I'm jealous. This wouldn't work on the address data I deal with

LethalLuggage
Автор

The second line of code raises an error for me (TypeError: string indices must be integers). Does anyone know why this happens? When I'm not trying to reassign the column it works just fine.

littlepianist
Автор

Any suggestions on dealing with date strings? I can’t seem to parse them into a date object to save my life. Formats all over the mmddyyyy yyyymmdd. Nightmare.

pewster
Автор

Can i know what kind of software your using please

vinikun
Автор

Next video: how to get time control of shorts on youtube with Python ;)

alejandropu
Автор

Great video! Take a look at my Pandas tutorial if you want.

MachineLearningPro
Автор

dict_rp = {'St.':'Stress',
'Rd':'Road'}

df_data['Address'].replace(dict_rp, regex = True)

rikaminski
Автор

The "ohh it looks better now" feeling after cleaning up some dogshit data.

Xarxes
Автор

Bro some condos and apartments are labeled by the half (ie. 354.5 urmoms lane). You just messed it all up in the matter of 10 seconds.

br