Render Dynamic Pages - Web Scraping Product Links with Python

preview_player
Показать описание
Thanks to Stuart for sending this site in! I enjoyed this scraping challenge.

This video will show a simple method that can help with dynamically loaded content. I use the requestes-html library to render the page in the background quickly and efficiently, and scrape all the product links from the html DIV using the XPATH selector. I loop through each link to get all the product information.

Coming in part 2 - pagination and functions to tidy up the code.
-------------------------------------
-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases

Рекомендации по теме
Комментарии
Автор

Keyboard too loud? I've been using my mech kb again.. Is it too distracting?

JohnWatsonRooney
Автор

i'm going through ALL of your videos and just finished this one! learning so much it's incredible!

xilllllix
Автор

THANK YOU for this video and all the others. I am learning web scraping to gather data for my PhD thesis and you have helped me make such great progress in just a few days. :)

schlotto
Автор

Amazing explanation skills! Everything was clear. One of the greatest video for web scraping so far! Good job, Good luck!!

ottomanasina
Автор

I can get data from static websites using scrapy with relative ease, but I always come unstuck when I try the same with dynamic websites; I might give "html_requests' a go instead of my usual scrapy-selenium combo...Thanks for the video! 👊👊👊

edcoughlan
Автор

Man this is some amazing content. So glad i found your channel! Definitely earned a subscribe.

kewl
Автор

this was super useful! I have a project rn that needs to scrape on many pages that need renderer. This looks much more lightweight than what I'm using rn (selenium)

mia_bobia_
Автор

Lifesaver! Thank you so much! Wish you the best of luck with your channel!

agsantiago
Автор

Very clearly explained. May I ask if there is a GitHub repo containing the code that you used in the video?

neginbabaiha
Автор

When I use Xpath, in products (on a different site, but same principles) terminal keeps returning 'None', the site is gwt based, would that affect xpath from working?

Aaron-qngu
Автор

You are a great and creative person...keep going champ.

dobcs
Автор

Awesome!, I was searching for such type of scraping, and I found

farhadkhan
Автор

Hi, I tried your code on other website, but when I arrived at print(products) part, it returns 'NoneType' object. The code get no url. What should I do?. I tried to use the user-agent, but also return nothing

bagia
Автор

Nice video - minus the try/catch with no specific exception. I know this is a tutorial, but that’s a bad habit to share. Regardless, thank you for the content.

Nope-
Автор

Hi John and everyone, I'm having trouble with the html.render() method, I'd appreciate any help.
First time the method runs, it downloads chromium. After I ran it, 3 red lines were printed (Downloading Chromium & stuff I can't remember), I felt like it took too long (more than 10 minutes), so I stopped the program.
Now when I try to run a the method, the script just get stucked, I mean, it is running, but never continues to the lines after the html.render method. No errors are raising, the script simply never finishes to run.
I tried to pip uninstall requests-html and reinstall it but I'm getting the same not indicative result.
How can I troubleshoot this problem? I'm excite to work with requests-HTML and letting for of Selenium for standard rendering needs, but I can't.
Thanks a lot for anyone who cares enough to give it a try.

royteicher
Автор

Hello John,
if i add command r.html.render(sleep=1) the output be "Cannot use HTMLSession within an existing event loop. Use AsyncHTMLSession instead.", i am anything on google, no clue, any idea?

charisthawhite
Автор

You are a truly life saver. great great video. thanks mate

kavehyarohi
Автор

John: when I follow your code, @ "for item in products.absolute_links:, although I specify, e.g. 'div.product-subtext', the iteration only returns the item.text, (the link text of item) and not the sub-text of the item. This is true of price, name, and so-forth. Can you explain this behavior?

justinames
Автор

You missed an explanation: what circumstances should you use xpath v div.<classname>?

Dome
Автор

Thank you so much. Your video is going to help me a lot in a project that I'm going to start. One question if you don't mind, when I want to gather text but there is a part of the text is appearing and there is a[ click for more] ~>hyperlink, that prevents the text from being fully copied to the csv file. Do you have a hint or suggestions? I appreciate your help in advance

Mr.AIFella