Supercharge Your Scraper With ASYNC (here's how)

preview_player
Показать описание

Async can make your code way more efficient by allowing it to do extra work inbetween waiting for requests and responses. In this video I will show you how to implement async into a web scraper and drastically reduce the time taken to scrape 1000 pages of data.

Рекомендации по теме
Комментарии
Автор

Excellent as always! I believe my web scraping performance has got much better after learning Javascript. Understanding async/await concept by learning JS promises was crucial for me.

saulo_foot
Автор

John, I am using scrapingbee synchronously to scrape 1000 URLs and growing and it takes forever.

Scrapingbee and other proxies allow for concurrent requests, while I also know you can do things Async. A video would be great on the difference and why you would do one or the other or how you would do both. Here are some questions:

1. Is concurrent procesess just for requests or the parsing as well? Does this impact writing to a csv if you have multiple processes running at once?

Appreciate your content. I feel like my scraper is almost there in terms of scalability and efficiency and I'm really excited.

(Although I probably need to implement a dataclass at some point)

AwB
Автор

What do you think of scraping google cache? Might speed it up too when you dont have the JS stuff to download

Автор

Hey John, thanks for this video. I see you recommend httpx over requests for async: what about the AsyncHTMLSession from requests-html?

JulienDeneuville
Автор

can i use async too if the website has a limit rate? for example : 429 too much request

christiandeantana
Автор

I ran into an issue with using aiohttp while requesting a bunch of urls at the same time, i don't know if its a problem on my behalf or the server is not happy with me. I've put a limit of how much tcp connections are made seem to solve the issue, anyways I'm beginning to consider httpx as an alternative.

yacinehechmi
Автор

async code makes things messy. i love to keep class base code and hard to handle that way. for speedy things, i use threading which works fine. if you have any video with async in class structure . would love to check that.

djangodeveloper
Автор

I would link a video showing async and threading when scraping using playwright!

FabioRBelotto
Автор

Is it legal to scrape data from foreign countries like making thousands of requests might crash their website 😅

srikanthkoltur