Web Scraping with AIOHTTP and Python

preview_player
Показать описание
AIOHttp is a client and server side library for Python 3.6 and above that enables us to create http requests asynchronously. It’s fully featured allowing sessions, cookies, custom headers, and everything else you’d expect to see - so naturally I thought it would be a useful tool to share for creating more advanced web scrapers.

When we are scraping data from the web the chances are we will need to make multiple requests to the server to extract the information we are after, given that each of these requests takes time we find that our code is effectively sat waiting for the response from the server before making the next. This slows the process right down. In its simplest form AIOHTTP allows us to use the Python asyncio library to send vast numbers of requests in a short amount of time, letting us create faster and more efficient web scrapers.

Support Me:

-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
-------------------------------------

#Timestamps

00:00 Intro
01:17 Docs
02:12 Demo Code
03:54 Web Scraper
09:38 HTML from each page
10:00 Parse HTML
12:10 Expanding Discussion
13:21 Outro
Рекомендации по теме
Комментарии
Автор

John,

Great tutorial... many thanks... now I know how to juggle... ;)

Wanted to pass on an observation... apparently, Windows, can be cranky with asyncio/aiohttp. Your example program throws a “RuntimeError: Event loop is closed ” error.



However, adding towards the bottom:





… on top of

pages = asyncio.run(main(urls))



… solves whatever 'Event loop' issues that were present.

Rasstag
Автор

Hi John, thank you very much for this. Found this video while trying to figure out how to include an async AIOHTTP loop in some API processing script I'm writing and this was invaluable for figuring out how to structure the code.

efferington
Автор

No requests from me... just love your videos John! - Thanks for spending the time...
I need this code to pull data for 188, 600 items (each one is a web page...with 3 tables each) -
UIPath would take about 32 days to complete. - asyncio + aiohttp should be much, much faster!
Thanks for the tip Rasstag ( had the same issue in Windows)

andresvideo
Автор

HEEY MAAAN RECENTLY WATCH YOUR VIDS ON SCRAPY!

YOU'RE SAVING MY LIFE AGAIN! THANKS1

GelsYT
Автор

Your videos and topics just keep getting better. Great job!

celerystalk
Автор

Always trying new ways of scraping. Great 👍🌹

tubelessHuma
Автор

Thank you it was really helpful to grasp the asyncio concept

fuad
Автор

Really good explanation ! thks a lot !

crmfhph
Автор

Thank you, John! Realy nice tutorial, helped alot.

mlkofvm
Автор

@John Watson Rooney Good tutorial! Thanks! But the lines 23 to 26 are synchrone no?

coala
Автор

Hi John, great videos by the way! I was wondering how can I scrape a website for the ASIN's, product title, stock levels and price?

abundance-pc
Автор

Subscribed. What are your thoughts on going about it this way vs something like scrapy?

DerekMurawsky
Автор

Hi John, thanks alot for your wonderful vedios, I was wondering which is faster async or multithreading in webscraping?

peterpann
Автор

Thank you so much, I had learn a lot from your videos, I have a question, It is possible that there is a similar option for pages in cloudflare, I currently use cloudscraper, but it has bugs, do you recommend something?

JesusTorres-bteb
Автор

So basically what's happening here in the whole program is that when on the event loop -- all of the tasks while the requests is being made, while it's waitiing it passes the resources to the other task functions? so on and so fort up until we got the response? I'm sorry if it's not that clear

GelsYT
Автор

Your videos are very informative...
Bro....can you make video on web scraping where cookies expires after 30 mins...example website like NSE etc

nishant
Автор

Great tutorial! How do we get around with the IP bans? Bombing the sever with async requests often gets me banned.

jithin.johnson
Автор

Can you explain please how to scrap products price from webstore and send telegram alert when price drop? Thanks for you're video

anto
Автор

Cool tutorial. Just one question: what could we do, if we want to add new urls to the task list from the parsed results?

rotatingmind
Автор

If you shoud define the best module for web scraping in terms of efficience and robust, what would be? I know selenium, requests, HTMLSessions, aiohttp, AsyncHTMLSession, scrapy, among others. What do you recommend to focus in specifically for its completeness. Thank you for your content.

nachoeigu