Recursion ➰ for Paginated Web Scraping

Показать описание

We figure out how to deal with the paginated search results in our web scrape. RECURSION is our tool - not as difficult as you might think!!

🗿 MILESTONES
⏯ 00:12 Fika 🍪
⏯ 13:10 Extracting the next page number with regex
⏯ 16:50 Encounter with prettier... 🌋
⏯ 18:39 ➰ Recap
⏯ 20:15 TIME FOR RECURSION 😎
⏯ 29:00 Quick Google rant 🌋
⏯ 29:23 ➰➰ Rerecap by Commenting the Code

See the previous episode where we explain Puppeteer and finding the data to scrape

The code used in this video is on GitHub

Puppeteer - Headless Chrome browser for scraping (instead of Phantom JS)

The editor is called Visual Studio Code and is free. Look for the Live Share extension to share your environment with friends.

DevTips is a weekly show for YOU who want to be inspired 👍 and learn 🖖 about programming. Hosted by David and MPJ - two notorious bug generators 💖 and teachers 🤗. Exploring code together and learning programming along the way - yay!

DevTips has a sister channel called Fun Fun Function, check it out!

#recursion #webscraping #nodejs

Рекомендации по теме

Комментарии

.. and you just answered my question on the previous video! Thanks! I enjoyed so much this two on web scraping.

simoneicardi

These two web scraping vids are awesome! Would love to see one on building a crawler 🕸

naansequitur

I would have used the “next” button at the navigation and use its href to get the next page until there are no more next pages

justvashu

Love this video - learned so much and the guys are entertaining to listen to. Thanks

kasio

Thanks you!!!
excellent video that really helped when trying to figure out puppeteer, moreover recursion!
I did find that the count in recursion didn't like numbers over 9 so i added these two lines to account of any pagination number.
```
const digit =
const newStreet = street.slice(0, -digit);
```
thanks again for a well timed video that saved the day :)

jolyonfavreau

Thank you so much David for this amazing scrapping video.

g-you

Im impressed that you didnt get an error saying 'browser is not defined'!

Joevanbo

Great tutorial, thank you so much for sharing! I am wondering how to design the function to stop when a certain length of found products has been reached (e.g. when 50 total partners are found, stop the recursion and proceed to other parts of the code) ?

avecho

David, great video. As for that h1 tag... they have a history of funny h1 tags on these landing pages. A little over a year ago, before the "360" rebranding changed their marketing site, I was looking at how they formatted their markup for SEO on one of their product pages. I noticed that the h1 tag was in the markup and said, for example, "Google Tag Manager...", but it was not visible to the user. If I remember correctly, on desktop the h1 tag had display:none attached to it. Then, once the hamburger menu breakpoint was crossed, it was still display:none; until you opened the menu, at which point display:none was removed and the h1 tag was wrapped around an img element with an image of the stylized "Google Tag Manager..." The actual text "Google Tag Manager..." in the h1 tag was hidden with CSS and probably used as a fallback. After some researching on Matt Cutts blog I found out that this is semi-okay to do.

drewlomax

Why all the regex stuff over just passing the page number as an argument and creating the URL in the method?

alexzanderflores

Great Vid! You guys should go over Docker next

charlyecastro

hello i have other basic python web scape code that saves to csv file and so what is the added code so we can save to csv file plz ?
Lisa and thank you

pjmclenon

Thanks!
Hmm silly-questions-section here : the first rule of scraping is "be nice", dont overload servers etc, wouldn't it be nicer if we first copied all result pages and scraped them locally? what's the general approach?

trendYou

Did the same thing with another web site. All is same. But sometimes it returns me an empty array [ ], sometimes it scrapes only 10 pages even there are 14 pages. Why is it so? I am so tired.

kainarilyasov

Hi David! On my pagination Url has no page parameter. Is there any way to scrape for Ajax response? The required content is loading from AJAX/ Client-side.

sridharnetha

Hello everyone, I think it does not work anymore. The class '' Compact '' is no longer there. How to fix that? I try with '' Landscape '' and it returns me an empty array in any case.

JohnnyMylot

I have a serious, and only slightly related question. The truth is I am not a coder, I am renting software via ParseHub. I can use the software just fine, but the website I am using to scrape, despite having tens of thousands of desired results, has a page limit of 15. There is no way I can get the amount of information I need from such small scrapes. Is there anyway to bypass this page limit and gain access to the totality of the actual results, oppose to pitiful amount I am actually able see at this time?

logandarsee

I'd like for you to deploy this (maybe to Firebase hosting, using Firebase Cloud Function). You probably would come up with an annoying CORS error, so I'd be interested to see how you resolve it. For myself, following the CORS tips in the Firebase Cloud Functions docs doesn't seem to help with web scraping with Puppeteer. :(

JBuchmann

David, please bring back the music when you timelapse :) Interested to see where this project is going. Keep it up, always looking forward to the next episode of this series.

Soundtech

While the cat's away the fika comes out to play.

Laek

Recursion ➰ for Paginated Web Scraping

Recursion ➰ for Paginated Web Scraping

How to Use Recursion in Python for API Pagination

Paginated REST API Calls with Recursive Parallelism in Google Cloud Application Integration | Part 1

How to use recursion in python for api pagination

1 LINE recursive NEWS SCRAPER with SCRAPY SHELL | Python one liner

PERFECT REQUEST OF ON DEMAND WEB SCRAPING TUTORIAL | Multi level recursive site crawling | SCRAPY

Zoom API- Extracting meeting attendees in recursive loop

recursive web scraper | stock news web scraping tutorial | python requests beautifulsoup

How to Use Okta Workflows to Iterate Through a Paginated API Response | Workflows Online Meetup

Firestore Pagination - It Just Got Easier

Iterate Through Pages And Count Rows Using Recursion

Fetch paginated data from a REST API like a pro!

Extremely FAST Paging With Cursor Pagination And Database Index Seek

Recursive Association | video 4 full course from scratch | java web project

The Pagination Selector Tutorial

Master Redux Async Fetching Like a Pro #Redux

Extracting Recursive Comment and Responses (old version)

Cursor Pagination For GraphQL APIs Explained

Working With APIs in Python - Pagination and Data Extraction

Lesson n2: Table Pagination Solution

Boost React Tables with Easy Pagination #React

Recursive async function || HTTP polling with Promises || JS

pagination #interview #testing #qa #api

sum number array recursive in c#