Scrapy Course – Python Web Scraping for Beginners

preview_player
Показать описание
The Scrapy Beginners Course will teach you everything you need to learn to start scraping websites at scale using Python Scrapy.

The course covers:
- Creating your first Scrapy spider
- Crawling through websites & scraping data from each page
- Cleaning data with Items & Item Pipelines
- Saving data to CSV files, MySQL & Postgres databases
- Using fake user-agents & headers to avoid getting blocked
- Using proxies to scale up your web scraping without getting banned
- Deploying your scraper to the cloud & scheduling it to run periodically

✏️ Course created by Joe Kearney.

⭐️ Resources ⭐️
Course Resources

Cloud Environments

Proxies

⭐️ Contents ⭐️
⌨️ (0:00:00) Part 1 - Scrapy & Course Introduction
⌨️ (0:08:22) Part 2 - Setup Virtual Env & Scrapy
⌨️ (0:16:28) Part 3 - Creating a Scrapy Project
⌨️ (0:28:17) Part 4 - Build your First Scrapy Spider
⌨️ (0:55:09) Part 5 - Build Discovery & Extraction Spider
⌨️ (1:20:11) Part 6 - Cleaning Data with Item Pipelines
⌨️ (1:44:19) Part 7 - Saving Data to Files & Databases
⌨️ (2:04:33) Part 8 - Fake User-Agents & Browser Headers
⌨️ (2:40:12) Part 9 - Rotating Proxies & Proxy APIs
⌨️ (3:18:12) Part 10 - Run Spiders in Cloud with Scrapyd
⌨️ (4:03:46) Part 11 - Run Spiders in Cloud with ScrapeOps
⌨️ (4:20:04) Part 12 - Run Spiders in Cloud with Scrapy Cloud
⌨️ (4:30:36) Part 13 - Conclusion & Next Steps

🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan

--

Рекомендации по теме
Комментарии
Автор

14:45 source venv/bin/activate is for the mac if youre on window
".\venv\Scripts\activate" use this in your terminal

NiranjanND
Автор

I'm in part 8 and I can't thank you enough for this course! The level of given knowledge is UNREAL !!!

johnteles
Автор

The issue we faced in part 6 was that the values added to the attributes of our `BookItem` instance in the `parse_book_page` method were being passed as `tuples` instead of `strings`. Removing commas at the end of the values should resolve this issue. Once we fix this problem, everything should work perfectly without needing to modify the `process_item` method.

VasylPoremchuk
Автор

FYI for those who want to scrape dynamic websites, dynamic websites needs selenium which is not included in this course.
But no cap, this is a great course.

ruzhanislam
Автор

Amazing tutorial, I've only gone through half of it, and I can say it's really easy to follow along and it does work ! Thanks a lot !

leolion
Автор

Thank you for the time you've put into this tutorial. That being said, you should make clear that the setup is different for windows than Mac. No bin folder for example

flanderstruck
Автор

This is the first coding course I followed up to an end. Nicely taught. Keep it up.

omyeole
Автор

at 52:00 you don't need to check for catalogue, you can just follow the url in the <a> tag and it gives me 1000 items

v_iancu
Автор

I am a python newbie without any experience in coding. With the help of this guide I am able to write a spider and fully understand the architecture. Really helpful👍👍👍They also have other guides to help you polish and functioning your spider, highly recommended!

greypeng
Автор

Thanks for another great video FreeCodeCamp! This is something I've wanted to spend more time on for a long time with python!!

lemastertech
Автор

Note for Windows users:
To activate virtual env, type venv\Scripts\activate

TriinTamburiin
Автор

this tutorial really needed the code aspect to help make sense of what is going on and fix errors. thanks

terraflops
Автор

I watched it twice and I think it can be shortened quite a lot and better organized.

felicytatomaszewska
Автор

13:37 creating venv
17:45 create scrapy project
29:31 create spider
33:38 shell

falcongold
Автор

This is so cool! I was able to follow until Part 6 but from Part 7 I couldn't so I will come back in the future after I have basic knowledge of MYSQL and databases. (Note for myself).

seaondao
Автор

1:34:58 instead of using a lot of if statement use mapping.
for example:
# saving the rating of the book as integer
ratings = {"One": 1, "Two": 2, "Three": 3, "Four": 4, "Five": 5}
rating = adapter.get("rating")
if rating:
adapter["rating"] = ratings[rating]
This is not only faster but it also looks clean.

pkavenger
Автор

I'm starting this course now and very excited! Thanks for the effort of teaching it

Felipe-ibcx
Автор

🎯 Key Takeaways for quick navigation:

00:00 *Scrapy Beginners Course*
01:51 *Scrapy: Open Source Framework*
03:12 *Scrapy vs. Python Requests*
04:24 *Scrapy Benefits & Features*
05:21 *Course Delivery & Resources*
06:18 *Course Outline Overview*
08:20 *Setting Up Python Environment*
16:38 *Creating Scrapy Project*
20:05 *Overview of Scrapy Files*
26:07 *Understanding Settings & Middleware*
27:13 *Settings and pipelines *
28:22 *Creating Scrapy spider *
30:24 *Understanding basic spider structure *
33:32 *Installing IPython for Scrapy shell *
34:27 *Using Scrapy shell for testing *
36:35 *Extracting data using CSS selectors *
38:23 *Extracting book title *
39:43 *Extracting book price *
40:49 *Extracting book URL *
41:18 *Practice using CSS selectors *
42:02 *Looping through book list *
43:15 *Running Scrapy spider *
47:29 *Handling pagination *
53:52 *Debugging and troubleshooting *
56:12 *Moving to detailed data extraction*
Update Next Page
Define Callback Function
Start Flushing Out
Data cleaning process: Remove currency signs, convert prices, format strings, validate data.
Standardization of data: Remove encoding, format category names, trim whitespace.
Pipeline processing: Strip whitespace, convert uppercase to lowercase, clean price data, handle availability.
Converting data types: Convert reviews and star ratings to integers.
Importance of data refinement: Iterative process of refining data and pipeline adjustments.
Saving data to different formats: CSV, JSON, and database (MySQL).
Different methods of saving data: Command line, feed settings, and custom settings.
Setting up MySQL database: Installation, creating a database, installing MySQL connector.
Setting up pipeline for MySQL: Initialize connection and cursor, create table if not exists.
01:56:31 *Create MySQL table*
02:04:42 *Understand user agents*
02:13:03 *Implement user agents*
02:25:01 *Scrapy API request*
02:26:11 *Fake user agents*
02:27:20 *Middleware setup*
02:33:00 *Robots.txt considerations*
02:40:19 *Proxies introduction*
02:42:34 *Proxy lists overview*
02:52:17 *Proxy ports alternative*
02:52:32 *Proxy provider benefits*
02:53:12 *Smartproxy overview*
02:54:44 *Residential vs. Datacenter proxies*
02:55:27 *Smartproxy signup process*
02:56:19 *Configuring Smartproxy settings*
02:58:07 *Adjusting spider settings*
03:00:23 *Creating a custom middleware*
03:01:21 *Setting up middleware parameters*
03:03:02 *Fixing domain allowance*
03:04:17 *Successful proxy usage confirmation*
03:05:00 *Introduction to proxy API endpoints*
03:06:29 *Obtaining API key for proxy API*
03:07:54 *Implementing proxy API usage*
03:10:36 *Ensuring proper function of proxy middleware*
03:12:10 *Simplifying proxy integration with SDK*
03:13:25 *Configuring SDK settings*
03:14:47 *Testing SDK integration*
03:17:56 *Upcoming sections on deployment and scheduling*
03:21:22 *Scrapy D: Free, configuration required.*
03:21:35 *Scrape Ops: UI interface, monitoring, scheduling.*
03:22:02 *Scrapey Cloud: Paid, easy setup, no server needed.*
03:49:42 *Dashboard configuration guide.*
03:51:21 *Set up ScrapeUp account.*
03:52:48 *Install monitoring extension.*
03:55:24 *Server setup instructions.*
04:00:51 *Job status and stats.*
04:01:47 *Analyzing stats for optimization.*
04:02:42 *Integration with ScrapeUp.*
04:18:05 *Scheduler Tab Options*
04:19:14 *Job Comparisons Dashboard*
04:20:15 *Scrappy Cloud Introduction*
04:21:36 *Scrappy Cloud Features*
04:22:20 *Scrappy Cloud Setup*
04:25:33 *Cloud Job Management*
04:28:57 *Scrappy Cloud Summary*

Made with HARPA AI

hxxzxtf
Автор

Thank you so much for providing this content for free. It's truly incredible that anyone with an internet connection can get free coding education, and its all thanks to people like you!

johnnygoffla
Автор

A wonderful video that we've used as a reference for our recent additions. Your sharing is highly appreciated!

Autoscraping