Fastest Python Web Scraper - Exploring Sessions, Multiprocessing, Multithreading, and Scrapy

Показать описание

In this video, we will make a fast web scraper. We will begin with BeautifulSoup.
🚀 The first script takes 128 seconds and after optimization, takes as little as 2.5 seconds.
Finally, we will create a scrapy spider without optimization and see what kind of results we get.
We will use BeautifulSoup, Requests, Sessions, Multithreading, Multiprocessing, and Scrapy.
You can jump to the sections you like:
00:31 Scraper Objective
00:44 Creating Scraper with Requests+BS4
9:20 First Run
10:07 Sessions
13:58 Multiprocessing
17:22 Multithreading
22:36 Scrapy Without Optimization

*Related videos*
-------

----------------------------------------------
What is Web Scraping?
In a nutshell: Web Scraping = Getting Data from Websites with Code

What is Scrapy?
Scrapy is a Python library to makes web scraping very powerful, fast, and efficient.

There are other libraries too like BeautifulSoup, for web scraping. However, when it comes to true power and flexibility, Scrapy is the most powerful.
Why Learn Scrapy?
- Most powerful library for scraping
- Easy to master
- Cross-platform: doesn't matter which OS you are using
- Cloud-ready: Can be run on the cloud with a free account

Most Important: You would be able to earn by taking up some of the web scraping gigs as a freelancer

#scrapy #fast #beautifulsoup #multiprocessing #multithreading

-~-~~-~~~-~~-~-
Please watch: "Making Scrapy Playwright fast and reliable"
-~-~~-~~~-~~-~-

Рекомендации по теме

Комментарии

Very well explained and structured video. I love the way you took us from without optimization till scrapy. Thank you for this video, it was very helpful!

anamashraf

Hello everyone. This time the text the smaller than my other videos. How is readability? Is it okay or larger would be better?
Looking forward to your comments.
PS: Please subscribe and like (or dislike) this video 🙂

codeRECODE

Hi Upendra, this is very useful, thanks a lot

ارمینمحمدجانی

When will you be uploading your new course, AI Agent Lecture? I am very excited and waiting eagerly for it.

danish

Thanks alot for this video, Helped me to solve a problem 💪🏿

ataimebenson

Keep up the good work, thanks for the video

bruce

Wow! Awesome video. Would you please let me know if it is possible to perform both multiprocessing and multithreading at the same time?

billygene

hm sadly Scrapy is single-threaded and Selenium is blocking if its called within a Spider, so the Spiders will not execute concurrently then (if they use Selenium instead of requests, to resolve an url). I wonder how it is possible to crawl that fast with Scrapy while also using Selenium for HTML-Rendering. Great video btw!

zone

Hi Upendra,

Thanks for the tutorial.

Can concurrent futures used to optimize "while True loop" with if then break at the end ?

I saw your tutorial and also did some googling and can't found any example.

Most of the example are 'for loop' or 'while loop' with predefined range.

DittoRahmat

the video I currently need. just curious, can you make scrapy faster than that?

chadGPT

Hi Sir,
Awesome Video
i am getting
"It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the setting for information on how to handle this deprecation.
return cls(crawler)"
Can you tell me about this? Please!

And also please tell me, where will i get output file?

vijay

I encountered a scenario. While using scraper_helper library to run spider directly from the script using vs code, I get below error:
"ImportError: attempted relative import with no known parent package"

I have to import the items file inside the spider which is why it throws this error, any solutions for this?

MohitAswani

Sir, make some videos on development part at server end

ashish

Do you have a video on how to implement multithreading in scrapy?

ataimebenson

Hello friend, congratulations for such an excellent video.

Friend I have the problem, and I don't know if I can solve it that way, I appreciate your great guidance.

I am creating a web service with FastApi, which has 2 endpoints where I extract to 2 websites.
.... /demo1
.... /demo2

When from postman for example I make a request. I want demo1 the browser opens and everything is fine, it does the extraction and it works perfect.

Following the example from postman, if I make a request to demo1 and at once I give it to demo 2... demo 2, I must wait for demo 1 to finish so that it opens the browser and does the extraction.

Can you please guide me on how to solve that.
I hope you can help me.
Greetings.

nelsongomez

Is it possible to automate cli using scrapy

hayathbasha

Python is not multithreaded unfortunately

CherifRahal

Fastest Python Web Scraper - Exploring Sessions, Multiprocessing, Multithreading, and Scrapy

PARALLEL and CONCURRENCY in Python for FAST Web Scraping

Python WEB SCRAPING in 30 Seconds! 🔥👨‍💻 #shorts

Fastest Python Web Scraper - Exploring Sessions, Multiprocessing, Multithreading, and Scrapy

How do you scrape data 100X faster? Bet you didn’t know this Google Sheets formula!

Python AI Web Scraper Tutorial - Use AI To Scrape ANYTHING

Beginners Guide To Web Scraping with Python - All You Need To Know

The Biggest Mistake Beginners Make When Web Scraping

Industrial-scale Web Scraping with AI & Proxy Networks

Want Faster HTTP Requests? Use A Session with Python!

Supercharge Your Scraper With ASYNC (here's how)

How to Make 2500 HTTP Requests in 2 Seconds with Async & Await

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

A Smart, Automatic, Fast and Lightweight Web Scraper for Python

Scraper Zillow with Only Requests Python FAST(2025)

Is Your Scraper Slow? Try THIS Simple Method

Fast Web Scraping in Python using Polars for Data Science

Scrape ANY Website With AI For Free - Best AI Web Scraper

Massively Speed Up Requests with HTTPX in Python

Always Check for the Hidden API when Web Scraping

Check Out This INSANE AI Web Scraper

Is THIS The Best Way to Build a Scraper API?

Don't Start Web Scraping without Doing These First

Python Tutorial - Google Scraper Thousands of URLs in Seconds

Use multiple concurrent threads to increase scraping speed | Scraper API