Solving real world data science tasks with Python Pandas!

preview_player
Показать описание
Practice your Python Pandas data science skills with problems on StrataScratch!

In this video we use Python Pandas & Python Matplotlib to analyze and answer business questions about 12 months worth of sales data. The data contains hundreds of thousands of electronics store purchases broken down by month, product type, cost, purchase address, etc.

Setup!

Check out the first video I did on Pandas:

Check out the videos I did on Matplotlib:

Detailed video description! (timeline can be found in comments)

We start by cleaning our data. Tasks during this section include:
- Drop NaN values from DataFrame
- Removing rows based on a condition
- Change the type of columns (to_numeric, to_datetime, astype)

Once we have cleaned up our data a bit, we move the data exploration section. In this section we explore 5 high level business questions related to our data:
- What was the best month for sales? How much was earned that month?
- What city sold the most product?
- What time should we display advertisemens to maximize the likelihood of customer’s buying product?
- What products are most often sold together?
- What product sold the most? Why do you think it sold the most?

To answer these questions we walk through many different pandas & matplotlib methods. They include:
- Adding columns
- Parsing cells as strings to make new columns (.str)
- Using the .apply() method
- Using groupby to perform aggregate analysis
- Plotting bar charts and lines graphs to visualize our results
- Labeling our graphs

If you enjoy this video, make sure to leave it a like and subscribe to not miss any future similar tutorials :).

Check out the new "solving real world data science tasks" video I posted!

---------------------------------------------

Follow me on social media!

---------------------------------------------

Video Timeline!
0:00 - Intro
1:22 - Downloading the Data
2:57 - Getting started with the code (Jupyter Notebook)

Task #1: Merging 12 csvs into a single dataframe (3:35)
4:25 - Read single CSV file
5:44 - List all files in a directory
7:06 - Concatenating files
11:00 - Reading in Updated dataframe

Task #2: Add a Month column (12:48)
14:12 - Parse string in Pandas cell (.str)

Cleaning our data!
17:31 - Drop NaN values from df
21:25 - Remove rows based on condition

Task #3: Add a sales column (24:58)
25:58 - Another way to convert a column to numeric (ints & floats)

Question #1: What was the best month for sales? (29:20)
30:35 - Visualizing our results with bar chart in matplotlib

Question #2: What city sold the most product? (34:17)
35:32 - Add a city column
36:10 - Using the .apply() method (super useful!!)
40:35 - Why do we use the lambda x ?
40:57 - Dropping a column
46:45 - Answering the question (using groupby)
47:34 - Plotting our results

Question #3: What time should we display advertisements to maximize the likelihood of purchases? (52:13)
53:16 - Using to_datetime() method
56:01 - Creating hour & minute columns
58:17 - Matplotlib line graph to plot our results
1:00:15 - Interpreting our results

Question #4: What products are most often sold together? (1:02:17)
1:03:31 - Finding duplicate values in our DataFrame
1:05:43 - Use transform() method to join values from two rows into a single row
1:08:00 - Dropping rows with duplicate values
1:09:39 - Counting pairs of products (itertools, collections)

Question #5: What product sold the most? Why do you think it did? (1:14:04)
1:15:28 - Graphing data
1:18:41 - Overlaying a second Y-axis on existing chart
1:23:41 - Interpreting our results

---------------------

Join the Python Army to get access to perks!

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.
Рекомендации по теме
Комментарии
Автор

"I dont know how to do it, but i know how to google it." this guys knows how things going in real world haha

billyjorrosh
Автор

As a programmer/data analyst/systems administrator I can safely say that this is exactly how we solve problems in real life. Good job!

justapugontheinternet
Автор

the best part part was watching some one google the answer an seeing how they implement the solution instead of just acting like they know everything. man your tutorials are the best an down to earth

terrymaverick
Автор

Great tutorial!
55:00 When parsing a column into datetime, specifying the format manually will decrease the execution time significantly:
all_data['Order Date'] = Date'], format='%m/%d/%y %H:%M')

sathirasilva
Автор

This situation so realistic. The mistakes, the solving.. great video!

helmialfath
Автор

This is the most practical Python tutorial video I've ever watched.

mid_paulownia
Автор

As a business major with very limited internship experience, I am teaching myself python and data analytics from scratch. This video is literal gold to me because this is one of the few that actually shows the entire wrangling process! Thanks for the great vid!

kyledawes
Автор

Hi Keith, I feel obligated to personally thank everyone that helps in pursuing my data career and of course, you included. I've used your project (and learned a LOT) and modify/add codes here and there with my own styling for my online portfolio. Moreover, you're a fantastic teacher and you deserve all the credits you should get for helping others like me. Thank you for doing this, may God return the favor and always bless you. Rock on Keith!

edric
Автор

Video Timeline!
0:00 - Intro
1:22 - Downloading the Data
2:57 - Getting started with the code (Jupyter Notebook)

Task #1: Merging 12 csvs into a single dataframe (3:35)
4:25 - Read single CSV file
5:44 - List all files in a directory
7:06 - Concatenating files
11:00 - Reading in Updated dataframe

Task #2: Add a Month column (12:48)
14:12 - Parse string in Pandas cell (.str)

Cleaning our data!
17:31 - Drop NaN values from df
21:25 - Remove rows based on condition

Task #3: Add a sales column (24:58)
25:58 - Another way to convert a column to numeric (ints & floats)

Question #1: What was the best month for sales? (29:20)
30:35 - Visualizing our results with bar chart in matplotlib

Question #2: What city sold the most product? (34:17)
35:32 - Add a city column
36:10 - Using the .apply() method (super useful!!)
40:35 - Why do we use the lambda x ?
40:57 - Dropping a column
46:45 - Answering the question (using groupby)
47:34 - Plotting our results

Question #3: What time should we display advertisements to maximize the likelihood of purchases? (52:13)
53:16 - Using to_datetime() method
56:01 - Creating hour & minute columns
58:17 - Matplotlib line graph to plot our results
1:00:15 - Interpreting our results

Question #4: What products are most often sold together? (1:02:17)
1:03:31 - Finding duplicate values in our DataFrame
1:05:43 - Use transform() method to join values from two rows into a single row
1:08:00 - Dropping rows with duplicate values
1:09:39 - Counting pairs of products (itertools, collections)

Question #5: What product sold the most? Why do you think it did? (1:14:04)
1:15:28 - Graphing data
1:18:41 - Overlaying a second Y-axis on existing chart
1:23:41 - Interpreting our results

Thanks for watching! If you enjoyed, please consider subscribing :).

KeithGalli
Автор

Love how this cool dude researches solutions on the fly and explains things as he goes even when he commits minor unforced errors. He is so relatable. His other tutorials on Pandas, Numpy, Matplotlib, etc. are equally helpful. I wish him all the success and hope that he continues to share his knowledge for decades to come.

anthonygonsalvis
Автор

Dude, this is by far one of the best real-life tutorials on YT. Subbed for more like this!

Hx
Автор

He is like my friend who teachs one day before exams. 😂😅

ujjawaljani
Автор

As a new learner of python I found this to be one of the best videos on youtube for beginners. How he managed to deal with the problems and solve them on the go (not knowing it all, but knowing how to consult google for the right answer). Way to go! Loved the approach and how easy you made it look

ijbarraza
Автор

Love how realistic and down to earth all your videos are! Makes data analysis way more approachable. What a guy!

olajiireolajide
Автор

Keith, you're literally the most underrated and one of the best teachers on youtube. This exercise cleared most of my doubts about Data Science and i fell in love with it because of you. Thank you so much for this, you're the best!

sushiplatter
Автор

Content of this quality deserves far more recognition. Thank you!

ciojxoh
Автор

Watching this 4 years after you published it, and you're still a legend ! Thank you !!!

Jordanptheone
Автор

Your assignments are harder than Coursera's. I'm actually learning something. Major thanks all the way from Holland! 🙏

hoiying-chan
Автор

At 50:10 for anyone who wants to use .unique(), when you calculate the sales for each city make sure to throw in a .reset_index() in there, it will reset the indexes and your bar is going to be alright.



then you do the rest like him, you can also throw in ascending order in there as well, just follow the rest of his instruction.

cityy=all_data.groupby("City").sum().reset_index().sort_values("Sales", ascending=False)

xxx=cityy["City"].unique()
plt.bar(xxx, cityy["Sales"])
plt.ylabel("$$$")
plt.xlabel("Cities")
plt.xticks(xxx, rotation='vertical', size=8)
plt.show()

Yayaloy
Автор

I love how this guy is explaining, I really enjoyed learning from you.

Magmatic