Solving real world data science tasks with Python Beautiful Soup! (movie dataset creation)

preview_player
Показать описание

In this video we scrape Wikipedia pages to create a dataset on Disney movies.

The video is formatted with tasks for you to try to solve on your own throughout. For the best learning experience, at each task you should pause the video, try the task on your own, and then resume when you want to see how I would solve it.

We cover a wide range of Python & data science topics in this video. They include:
- Web scraping with BeautifulSoup
- Cleaning data
- Testing code with Pytest
- Pattern matching with regular expressions (Re library)
- Working with dates (datetime library)
- Saving & loading data with Pickle library
- Accessing data from an API using Requests library

If you enjoyed this video, make sure to like & subscribe :)

This video was sponsored by DataCamp

---------------------
Video timeline!
0:00 - Video overview
1:58 - Check out DataCamp! (sponsored)
3:12 - Setup

Task #1: Scrape the infobox from Toy Story 3 wiki page (save in python dictionary) (4:24)

Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries) (28:52)
32:52 - Task #2: Scrape infobox for all movies in List of Disney Films (save as list of dictionaries)
57:27 - Save & Load dataset checkpoint (JSON file)

Task #3: Clean our data! (1:02:04)
1:09:28 - Task #3.1: Strip out all references ([1],[2],etc) from HTML
1:16:39 - Task #3.2: Split up the long strings
1:25:02 - Task #3.3: Examine errors we are getting
1:30:27 - Task #3.4: Convert “Running time” field to an integer
1:44:57 - Task #3.5: Convert “Budget” & “Box office” fields to floats
2:33:53 - Task #3.6: Convert dates into datetime objects
2:47:36 - Saving our data again (using Pickle)

Task #4: Attach IMDB, Metascore, and Rotten Tomatoes scores to dataset (working with APIs) (2:53:18)

Task #5: Save final dataset as a JSON file and as a CSV file (3:13:48)

---------------------
Extra resources!

Practice your Python Pandas data science skills with problems on StrataScratch!

Join the Python Army to get access to perks!

---------------------
Follow me on social media!

*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.
Рекомендации по теме