How to Scrape JavaScript Websites with Scrapy and Playwright

preview_player
Показать описание
No page is out of reach! Using scrapy and playwright we have the best of both worlds for javascript rendering and data scraping capabilities. In this project i will show you how to get started with a basic scraper on a javascript heavy website, using scrapy-playwright. By putting the headless browser infront of scrapy to make the requests we are able to render out the page, and even wait for certain selectors to be visible before we return the page DOM/HTML and have it be parsed with Scrapy

Doing it this way we have many benefits; scrapy items, item loader, pipelines, middleware all accessible for us to use. There are a few drawbacks however, any web scraping using a real browser is inheritly slower - this is something we can't avoid, as the nature of this method requries loading a browser up to access the page. It does however give us access to sites that we previously would have issues scraping.

Support Me:

-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
-------------------------------------
Рекомендации по теме
Комментарии
Автор

I started my first playwright project after constantly failing to extract json from an endpoint because of some graphql nonsense. My constant thought was "I sure wish I could integrate playwright this with scrapy." You and the algorithm gods have answered my prayers.

alexanderscott
Автор

I've only just started scraping(lol) the surface of web scraping so alot of your content goes over my head but, your videos are really great and a complete gold mine to anyone who is trying to learn. Thank you!

tommifish
Автор

This YouTube channel is probably the only one with the best website crawling software and techniques I've seen! Thank you very much for the amazing content, John! You should make a course about this stuff, really useful.

drac.
Автор

I thought about scrapy + playwright as replacement of selenium and now you upload this. Thank you so much!

adnanpramudio
Автор

Hey John! It’s rarely that I comment on youtube videos, but I just must say that your content is golden. Keep it up!

realpropagandalf
Автор

I have just started using scrapy for crawling, you're videos are very helpful. 👍

ruhollahmozafari
Автор

new great library that helps for dynamic pages, thanks a lot John

CodePhiles
Автор

Hi, currently I'm working with crawling in my job, your videos is helping me alot!

SamirMamude
Автор

Thanks so much for introducing another great tool! Definitely worth learning after Selenium/Helium. Great job again John!

celerystalk
Автор

Hi John, I am trying to run the exact same code in my Windows machine which you showed here but I am getting lot of errors like "AttributeError: 'PipeTransport' object has no attribute '_output'" and "AttributeError: object has no attribute 'browser_type'". I have done the exact same setting like you did. Kindly help me. Thanks

spotshot
Автор

Great Video Man. Want to see more videos Scrapy with Playwright

automationhungry
Автор

That was exactly what i was looking for thank you ! (splash wasn't able to load javascript)

samibdh
Автор

Awesome video, very well explained. Definitely worth the time. Pure gold. Thank you.

gianfrancodagostino
Автор

Awesome video! Could you also make a video about scraping websites that make repetitive calls to an api and then use javascript to format the json response (i.e making direct calls to the api returns gibberish json values). Thanks a lot mate.

dennistanui
Автор

Playwright making scraping life easy. Great 💖

tubelessHuma
Автор

ERROR: AttributeError: 'PipeTransport' object has no attribute '_output', same code, can you fix this please?

vasugupta
Автор

is anyone else getting this error: AttributeError: 'PipeTransport' object has no attribute '_output

jensshumway
Автор

I'm new to python and scrapy. Following your tutorials in just 3 days I've been able to build and get a much better understanding of scrapy and python. My current site has a pagination that is in javascript it my understanding that I'll need to use splash or playwright. Which one would you recommend for a beginner?

Scuurpro
Автор

This video is great! now I got to figure out how to customise this for a login page.

melih.a
Автор

Hi John, thank you for the videos, it helped me alot! I am a bit stuck at the moment with the JS website. How can I do the "callback" to go to the next page when I have 2 functions now? I have tried to run them in a while loop but with little result. How would you do it on this example if it would have multiple pages?

greoipsec