filmov
tv
Data Cleaning In Python Pandas (Beginners Data Analyst Tutorial)

Показать описание
Data Cleaning In Python Pandas (Beginners Data Analyst Tutorial)
In this video, you'll learn the basics of a Python library I used a lot in my data projects to do ETL, data cleansing, and report automation.
It’s called Pandas and it’s like a smarter, musclier, and cuter version of Excel VBA and SQL. You can use Pandas to query data like you would with SQL, and you can use it for data cleansing and data analysis like you would in Excel and VBA like doing Sorts, Filters, Remove Duplicates, and Formulas.
It’s an amazing tool to learn, and this video is a quick crash course on the basics of Pandas where you’ll learn the following:
0:00 Intro
1:47 Install Python
2:57 Install Our Code Editor (Jupyter Notebook)
3:18 Install Pandas
3:28 Launch & Use The Code Editor
6:24 Import Data
10:19 Add new columns
11:29 Convert data types
13:07 Sort data
14:38 Remove duplicates
16:33 Joins
19:12 Replace values
20:11 Use If statements
23:29 Export our data to CSV
25:15 Outtro
The dataset we’ll be using for this crash course is this made up driver license data - one is the contact details of a driver in a CSV file, and the other is their license details in another CSV file.
As a data analyst, the data cleaning and ETL requirements you’ve been given are to:
-Add a column for Full Name - so that’s concatenating First & Last Name
-Remove the Time from Birthdate - so it only has the date only without zeroes
-Join the Driver & License Files using a Left Join
-Replace the 1 & 2 values in Test Passed with No & Yes
-Determine if a License Status is Valid if Test Passed is Yes & Points is less than 12
-And then split & export the data into 2 CSV files - 1 with Valid statuses only, and the other with Invalid statuses.
In this video, you'll learn the basics of a Python library I used a lot in my data projects to do ETL, data cleansing, and report automation.
It’s called Pandas and it’s like a smarter, musclier, and cuter version of Excel VBA and SQL. You can use Pandas to query data like you would with SQL, and you can use it for data cleansing and data analysis like you would in Excel and VBA like doing Sorts, Filters, Remove Duplicates, and Formulas.
It’s an amazing tool to learn, and this video is a quick crash course on the basics of Pandas where you’ll learn the following:
0:00 Intro
1:47 Install Python
2:57 Install Our Code Editor (Jupyter Notebook)
3:18 Install Pandas
3:28 Launch & Use The Code Editor
6:24 Import Data
10:19 Add new columns
11:29 Convert data types
13:07 Sort data
14:38 Remove duplicates
16:33 Joins
19:12 Replace values
20:11 Use If statements
23:29 Export our data to CSV
25:15 Outtro
The dataset we’ll be using for this crash course is this made up driver license data - one is the contact details of a driver in a CSV file, and the other is their license details in another CSV file.
As a data analyst, the data cleaning and ETL requirements you’ve been given are to:
-Add a column for Full Name - so that’s concatenating First & Last Name
-Remove the Time from Birthdate - so it only has the date only without zeroes
-Join the Driver & License Files using a Left Join
-Replace the 1 & 2 values in Test Passed with No & Yes
-Determine if a License Status is Valid if Test Passed is Yes & Points is less than 12
-And then split & export the data into 2 CSV files - 1 with Valid statuses only, and the other with Invalid statuses.