Introduction to Data Science with Python – Preprocessing Dirty Data with Pandas

preview_player
Показать описание
This tutorial is about how to efficiently use Pandas (a data manipulation and analysis library built on top of Python Programming Language) for data preprocessing. We will look at how to think through and implement data cleaning tasks. We will formulate hypothesis about the data and justify why the formulation holds. Important concepts in data preprocessing will be discussed:
- Checking the data types of fields
- Dealing with missing values
- Split a single column into multiple independent fields
- Remove irrelevant columns

Improve your data preprocessing skills so that you can quickly get to the insight in your data. Making your analysis error-free and insight-rich depend on how well you pre-process the data.

Some Pandas Methods (Functions) Discussed

Regular Expression Techniques
- captured group
- last character ($ dollar sign)
- lookbehind assertion
- work character (backslash lower case w)
- space character (backslash lower case s)
- question mark quantifier (match zero or one time)
- asterisks quantifier (match zero or more times)

Python Function
- dir – to get a glimpse of the objects in a module

Timestamp
00:00 Intro
01:21 Jupyter and Import Pandas Library
02:18 Read Data into Pandas DataFrame
04:18 Count and Find Characters in a String
09:44 Split Column using Colon as Delimiter
12:16 Data Familiarization
13:30 Multiple Steps to Extract Substring – Regular Expression
16:50 Single Step to Extract Substring – Regular Expression
18:31 Split Column into Multiple Fields
21:50 Data Cleaning
25:10 Conclusion

Download
Рекомендации по теме
visit shbcf.ru