Web Scraping with AIOHTTP and Python

Показать описание

AIOHttp is a client and server side library for Python 3.6 and above that enables us to create http requests asynchronously. It’s fully featured allowing sessions, cookies, custom headers, and everything else you’d expect to see - so naturally I thought it would be a useful tool to share for creating more advanced web scrapers.

When we are scraping data from the web the chances are we will need to make multiple requests to the server to extract the information we are after, given that each of these requests takes time we find that our code is effectively sat waiting for the response from the server before making the next. This slows the process right down. In its simplest form AIOHTTP allows us to use the Python asyncio library to send vast numbers of requests in a short amount of time, letting us create faster and more efficient web scrapers.

Support Me:

-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
-------------------------------------

#Timestamps

00:00 Intro
01:17 Docs
02:12 Demo Code
03:54 Web Scraper
09:38 HTML from each page
10:00 Parse HTML
12:10 Expanding Discussion
13:21 Outro

Рекомендации по теме

Комментарии

John,

Great tutorial... many thanks... now I know how to juggle... ;)

Wanted to pass on an observation... apparently, Windows, can be cranky with asyncio/aiohttp. Your example program throws a “RuntimeError: Event loop is closed ” error.

However, adding towards the bottom:

… on top of

pages = asyncio.run(main(urls))

… solves whatever 'Event loop' issues that were present.

Rasstag

Hi John, thank you very much for this. Found this video while trying to figure out how to include an async AIOHTTP loop in some API processing script I'm writing and this was invaluable for figuring out how to structure the code.

efferington

No requests from me... just love your videos John! - Thanks for spending the time...
I need this code to pull data for 188, 600 items (each one is a web page...with 3 tables each) -
UIPath would take about 32 days to complete. - asyncio + aiohttp should be much, much faster!
Thanks for the tip Rasstag ( had the same issue in Windows)

andresvideo

HEEY MAAAN RECENTLY WATCH YOUR VIDS ON SCRAPY!

YOU'RE SAVING MY LIFE AGAIN! THANKS1

GelsYT

Your videos and topics just keep getting better. Great job!

celerystalk

Always trying new ways of scraping. Great 👍🌹

tubelessHuma

Thank you it was really helpful to grasp the asyncio concept

fuad

Really good explanation ! thks a lot !

crmfhph

Thank you, John! Realy nice tutorial, helped alot.

mlkofvm

@John Watson Rooney Good tutorial! Thanks! But the lines 23 to 26 are synchrone no?

coala

Hi John, great videos by the way! I was wondering how can I scrape a website for the ASIN's, product title, stock levels and price?

abundance-pc

Subscribed. What are your thoughts on going about it this way vs something like scrapy?

DerekMurawsky

Hi John, thanks alot for your wonderful vedios, I was wondering which is faster async or multithreading in webscraping?

peterpann

Thank you so much, I had learn a lot from your videos, I have a question, It is possible that there is a similar option for pages in cloudflare, I currently use cloudscraper, but it has bugs, do you recommend something?

JesusTorres-bteb

So basically what's happening here in the whole program is that when on the event loop -- all of the tasks while the requests is being made, while it's waitiing it passes the resources to the other task functions? so on and so fort up until we got the response? I'm sorry if it's not that clear

GelsYT

Your videos are very informative...
Bro....can you make video on web scraping where cookies expires after 30 mins...example website like NSE etc

nishant

Great tutorial! How do we get around with the IP bans? Bombing the sever with async requests often gets me banned.

jithin.johnson

Can you explain please how to scrap products price from webstore and send telegram alert when price drop? Thanks for you're video

anto

Cool tutorial. Just one question: what could we do, if we want to add new urls to the task list from the parsed results?

rotatingmind

If you shoud define the best module for web scraping in terms of efficience and robust, what would be? I know selenium, requests, HTMLSessions, aiohttp, AsyncHTMLSession, scrapy, among others. What do you recommend to focus in specifically for its completeness. Thank you for your content.

nachoeigu

Web Scraping with AIOHTTP and Python

Web Scraping with AIOHTTP and Python

Building an Asynchronous Web Scraper Using AsyncIO and AIOHTTP in Python

How to Make 2500 HTTP Requests in 2 Seconds with Async & Await

Supercharge Your Scraper With ASYNC (here's how)

Async Requests Made Simple - Grequests for Web Scraping with Python

Python Asyncio, Requests, Aiohttp | Make faster API Calls

Massively Speed Up Requests with HTTPX in Python

Use THIS to stay JUST under rate limits with Async

🚀 Need to handle multiple HTTP requests in Python? aiohttp makes it easy! #coding #webscraping

How to scrape BizBuySell with Python and aiohttp, to get all businesses for sale in New York

Building a Web Scraper with AsyncIO and aiohttp - Explanation

How To Do Asynchronous Web Scraping In Python

Rotating Proxies For Web Requests in Python

Always close sessions in AIOHTTP #webscraping #programming #tutorial #python #shorts

Want Faster HTTP Requests? Use A Session with Python!

Master httpx: Sync vs Async Clients in Python! 🚀 #Python #httpx #codingtips #coding #webscraping

PYTHON : asyncio web scraping 101: fetching multiple urls with aiohttp

How To Scrape Multiple Website URLs with Python?

Slow Web Scraper? Try this with ASYNC and Requests-html

Async Python Tutorial: Web Scraping Synchronously versus Asynchronously (10x faster)

Rapid Web scraping using AsyncIO - Web Scraping for Data science

Asynchronous HTTP Requests in Python with aiohttp

Async Price Data Load with aiohttp and asyncpg

Web Scraping project of your Nightmares