Scrapy Course – Python Web Scraping for Beginners

Показать описание

The Scrapy Beginners Course will teach you everything you need to learn to start scraping websites at scale using Python Scrapy.

The course covers:
- Creating your first Scrapy spider
- Crawling through websites & scraping data from each page
- Cleaning data with Items & Item Pipelines
- Saving data to CSV files, MySQL & Postgres databases
- Using fake user-agents & headers to avoid getting blocked
- Using proxies to scale up your web scraping without getting banned
- Deploying your scraper to the cloud & scheduling it to run periodically

✏️ Course created by Joe Kearney.

⭐️ Resources ⭐️
Course Resources

Cloud Environments

Proxies

⭐️ Contents ⭐️
⌨️ (0:00:00) Part 1 - Scrapy & Course Introduction
⌨️ (0:08:22) Part 2 - Setup Virtual Env & Scrapy
⌨️ (0:16:28) Part 3 - Creating a Scrapy Project
⌨️ (0:28:17) Part 4 - Build your First Scrapy Spider
⌨️ (0:55:09) Part 5 - Build Discovery & Extraction Spider
⌨️ (1:20:11) Part 6 - Cleaning Data with Item Pipelines
⌨️ (1:44:19) Part 7 - Saving Data to Files & Databases
⌨️ (2:04:33) Part 8 - Fake User-Agents & Browser Headers
⌨️ (2:40:12) Part 9 - Rotating Proxies & Proxy APIs
⌨️ (3:18:12) Part 10 - Run Spiders in Cloud with Scrapyd
⌨️ (4:03:46) Part 11 - Run Spiders in Cloud with ScrapeOps
⌨️ (4:20:04) Part 12 - Run Spiders in Cloud with Scrapy Cloud
⌨️ (4:30:36) Part 13 - Conclusion & Next Steps

🎉 Thanks to our Champion and Sponsor supporters:
👾 davthecoder
👾 jedi-or-sith
👾 南宮千影
👾 Agustín Kussrow
👾 Nattira Maneerat
👾 Heather Wcislo
👾 Serhiy Kalinets
👾 Justin Hual
👾 Otis Morgan

--

freeCodeCamp.org

Рекомендации по теме

Комментарии

14:45 source venv/bin/activate is for the mac if youre on window
".\venv\Scripts\activate" use this in your terminal

NiranjanND

I'm in part 8 and I can't thank you enough for this course! The level of given knowledge is UNREAL !!!

johnteles

The issue we faced in part 6 was that the values added to the attributes of our `BookItem` instance in the `parse_book_page` method were being passed as `tuples` instead of `strings`. Removing commas at the end of the values should resolve this issue. Once we fix this problem, everything should work perfectly without needing to modify the `process_item` method.

VasylPoremchuk

FYI for those who want to scrape dynamic websites, dynamic websites needs selenium which is not included in this course.
But no cap, this is a great course.

ruzhanislam

Amazing tutorial, I've only gone through half of it, and I can say it's really easy to follow along and it does work ! Thanks a lot !

leolion

Thank you for the time you've put into this tutorial. That being said, you should make clear that the setup is different for windows than Mac. No bin folder for example

flanderstruck

This is the first coding course I followed up to an end. Nicely taught. Keep it up.

omyeole

at 52:00 you don't need to check for catalogue, you can just follow the url in the <a> tag and it gives me 1000 items

v_iancu

I am a python newbie without any experience in coding. With the help of this guide I am able to write a spider and fully understand the architecture. Really helpful👍👍👍They also have other guides to help you polish and functioning your spider, highly recommended!

greypeng

Thanks for another great video FreeCodeCamp! This is something I've wanted to spend more time on for a long time with python!!

lemastertech

Note for Windows users:
To activate virtual env, type venv\Scripts\activate

TriinTamburiin

this tutorial really needed the code aspect to help make sense of what is going on and fix errors. thanks

terraflops

I watched it twice and I think it can be shortened quite a lot and better organized.

felicytatomaszewska

13:37 creating venv
17:45 create scrapy project
29:31 create spider
33:38 shell

falcongold

This is so cool! I was able to follow until Part 6 but from Part 7 I couldn't so I will come back in the future after I have basic knowledge of MYSQL and databases. (Note for myself).

seaondao

1:34:58 instead of using a lot of if statement use mapping.
for example:
# saving the rating of the book as integer
ratings = {"One": 1, "Two": 2, "Three": 3, "Four": 4, "Five": 5}
rating = adapter.get("rating")
if rating:
adapter["rating"] = ratings[rating]
This is not only faster but it also looks clean.

pkavenger

I'm starting this course now and very excited! Thanks for the effort of teaching it

Felipe-ibcx

🎯 Key Takeaways for quick navigation:

00:00 *Scrapy Beginners Course*
01:51 *Scrapy: Open Source Framework*
03:12 *Scrapy vs. Python Requests*
04:24 *Scrapy Benefits & Features*
05:21 *Course Delivery & Resources*
06:18 *Course Outline Overview*
08:20 *Setting Up Python Environment*
16:38 *Creating Scrapy Project*
20:05 *Overview of Scrapy Files*
26:07 *Understanding Settings & Middleware*
27:13 *Settings and pipelines *
28:22 *Creating Scrapy spider *
30:24 *Understanding basic spider structure *
33:32 *Installing IPython for Scrapy shell *
34:27 *Using Scrapy shell for testing *
36:35 *Extracting data using CSS selectors *
38:23 *Extracting book title *
39:43 *Extracting book price *
40:49 *Extracting book URL *
41:18 *Practice using CSS selectors *
42:02 *Looping through book list *
43:15 *Running Scrapy spider *
47:29 *Handling pagination *
53:52 *Debugging and troubleshooting *
56:12 *Moving to detailed data extraction*
Update Next Page
Define Callback Function
Start Flushing Out
Data cleaning process: Remove currency signs, convert prices, format strings, validate data.
Standardization of data: Remove encoding, format category names, trim whitespace.
Pipeline processing: Strip whitespace, convert uppercase to lowercase, clean price data, handle availability.
Converting data types: Convert reviews and star ratings to integers.
Importance of data refinement: Iterative process of refining data and pipeline adjustments.
Saving data to different formats: CSV, JSON, and database (MySQL).
Different methods of saving data: Command line, feed settings, and custom settings.
Setting up MySQL database: Installation, creating a database, installing MySQL connector.
Setting up pipeline for MySQL: Initialize connection and cursor, create table if not exists.
01:56:31 *Create MySQL table*
02:04:42 *Understand user agents*
02:13:03 *Implement user agents*
02:25:01 *Scrapy API request*
02:26:11 *Fake user agents*
02:27:20 *Middleware setup*
02:33:00 *Robots.txt considerations*
02:40:19 *Proxies introduction*
02:42:34 *Proxy lists overview*
02:52:17 *Proxy ports alternative*
02:52:32 *Proxy provider benefits*
02:53:12 *Smartproxy overview*
02:54:44 *Residential vs. Datacenter proxies*
02:55:27 *Smartproxy signup process*
02:56:19 *Configuring Smartproxy settings*
02:58:07 *Adjusting spider settings*
03:00:23 *Creating a custom middleware*
03:01:21 *Setting up middleware parameters*
03:03:02 *Fixing domain allowance*
03:04:17 *Successful proxy usage confirmation*
03:05:00 *Introduction to proxy API endpoints*
03:06:29 *Obtaining API key for proxy API*
03:07:54 *Implementing proxy API usage*
03:10:36 *Ensuring proper function of proxy middleware*
03:12:10 *Simplifying proxy integration with SDK*
03:13:25 *Configuring SDK settings*
03:14:47 *Testing SDK integration*
03:17:56 *Upcoming sections on deployment and scheduling*
03:21:22 *Scrapy D: Free, configuration required.*
03:21:35 *Scrape Ops: UI interface, monitoring, scheduling.*
03:22:02 *Scrapey Cloud: Paid, easy setup, no server needed.*
03:49:42 *Dashboard configuration guide.*
03:51:21 *Set up ScrapeUp account.*
03:52:48 *Install monitoring extension.*
03:55:24 *Server setup instructions.*
04:00:51 *Job status and stats.*
04:01:47 *Analyzing stats for optimization.*
04:02:42 *Integration with ScrapeUp.*
04:18:05 *Scheduler Tab Options*
04:19:14 *Job Comparisons Dashboard*
04:20:15 *Scrappy Cloud Introduction*
04:21:36 *Scrappy Cloud Features*
04:22:20 *Scrappy Cloud Setup*
04:25:33 *Cloud Job Management*
04:28:57 *Scrappy Cloud Summary*

Made with HARPA AI

hxxzxtf

Thank you so much for providing this content for free. It's truly incredible that anyone with an internet connection can get free coding education, and its all thanks to people like you!

johnnygoffla

A wonderful video that we've used as a reference for our recent additions. Your sharing is highly appreciated!

Autoscraping

Scrapy Course – Python Web Scraping for Beginners

Scrapy Course – Python Web Scraping for Beginners

Web Scraping Crash Course! With Python and Scrapy [PyOhio 2023]

Scrapy course – python web scraping for beginners

Scrapy in 30 Minutes (start here.)

Coding Web Crawler in Python with Scrapy

Python Scrapy Tutorial for Beginners

Web Scraping using Scrapy | Scrapy Tutorial + Storing Data to MongoDB 🍃

Web Scraping with Python - Beautiful Soup Crash Course

1 introduccion

Web Scraping Using Scrapy Tutorial For Beginners: Learn Scrapy From Scratch

Beginners Guide To Web Scraping with Python - All You Need To Know

Python WEB SCRAPING in 30 Seconds! 🔥👨‍💻 #shorts

Scrapy Tutorial: How to Crawl & Scrape any website using Scrapy and Python

Python Scrapy 5 Part Beginner Mini-Course: Introduction

Scrapy Tutorial: Python Web Scraping

Web Scraping vs Web Crawling Explained

Complete Scrapy tutorial Part #1 Follow For complete video

Create Your First Scrapy Spider - Python Scrapy Beginner Series [Part 1]

Python Web Scraping APIs using Scrapy - Automation

Python Skills: Crawling the Web with Python and Scrapy Course Preview

The Ultimate Guide to Web scraping using Scrapy Python

Web Scraping Course For Beginners 2024 | Learn Web Scraping with Practical in 2 Hours (FREE)

Introduction to Scrapy API | Practical Python Web Scraping Tutorial (Part 1 of 2)

Python Scrapy - Web Scraping IMDB top chart