Nodejs Puppeteer Tutorial #4 - Scrape multiple pages in parallel using puppeteer-cluster

preview_player
Показать описание

This puppeteer tutorial is designed for beginners to learn how to use the node js puppeteer library to perform web scraping, testing, and creating website bots. Puppeteer is a Node library that provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default but can be configured to run full (non-headless) Chrome or Chromium.

Donate
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Bitcoin Wallet: bc1q05j8gcnq4mzvgj603cxdc8xxck4jgnu2ljsrt4
Ethereum Wallet: 0x5e7BD4f473f153d400b39D593A55D68Ce80F8a2e

Social
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Tags:
- Nodejs Tutorials
- Puppeteer Nodejs
- Nodejs Puppeteer Tutorial
- Puppeteer Tutorial for Beginners

#nodejs #puppeteer #webscraping
Рекомендации по теме
Комментарии
Автор

Thanks a lot for this tutorial series, was invaluable in getting my project started

lukasa
Автор

Thanks Michael you have helped my curiosity and add to my knowledge, word is not enough to thank you 💎💎

activehubmolatech
Автор

Thank you, this is a very straight forward walk through, nicely done!

yinglll
Автор

Man, you are holy
I'm struggling with parser long time, and server run out of resources cause of leaks
hope I will implement your code properly
actually, my code similar, but without clasterization

MrGerman
Автор

awesome bro keep it up, subscribed :)

nkitpatel
Автор

This looks like that we can perform a specific task in parallel to all the pages, what if have unique tasks that need to be done on multiple pages in parallel ?

abdulhannan
Автор

Great tutorial Mike! Thanks a lot. One question... how can I create json file from different scraped pages and keep update that json data every minute? Any suggestion for that task? Thanks

omargian_stw
Автор

Please I need your help, I can’t scrape data on mobile devices using peppeteer cluster after setting the user agent to a mobile device

miesineagent
Автор

Amazing tutorial, i know its been a while but could you please answer my question? I got everything working and can scrape multiple pages at the same time but i want to make a request for each page. I currently make the request in the await.cluster.task function but each request can take a while (around 1 minute or even 2) and I want to make sure it finishes before continuing. I currently get the error: Error Crawling: "websiteurl": Timeout 30000

harimzermeno
Автор

Hi, thanks a lot .

in the cuncureency_page mode, we can open mutiple tabs on the same single browser at the same time, but my problme is after the on url finish, the tab will be close and a new one will open for the next url, I think this closing and openning tab for each url will consume more time and machine resources, do you now how I can navigate to another url at the same tab instead of closing it and open a new one for the next url ?

mouhannadal-hmedi
Автор

at time 16:27 i dont understand what you were doing, why we have to have line 82 and 83?

mykun
Автор

Hello Michael,
Do you know if it's possible to deploy a pupeteer cluster to an AWS Lambda function ?

zeroxdeveloper
Автор

Thanks, do you know how to slow down the while loop count

oladapoosunkeye
Автор

brother i am trying to run multiple chrome browsers with different profiles but cant find a way...is it possible using cluster? if yes then how i am supposed to change profile name every time in puppeteer lauch ...Please help me

gammingloverpc
Автор

how to make it stay longer on browser when using cluster.queue to load the url link while I need this link to perform web scraping?

xiaoyunn
Автор

Do you need to install the puppeteer package as well as the cluster package, or is it all included in the cluster package?

levihalperin
Автор

can i add queue dynamically from express request?

muhammadarifafandi
Автор

i'm trying to use cluster.execute and resolve promises but i'm getting navigationerrors
can you help?

LatestLyricals
Автор

how to combine puppeter-cluster and ??

restianais
Автор

i am getting this error please help:

Navigation timeout of 30000 ms exceeded
Error crawling

deer
visit shbcf.ru