How To Scrape (almost) ANY Website with Python

preview_player
Показать описание

Using browser automation isn't generally my go to for scraping but sometimes it gives us an easy option for grabbing data. Scaling is an issue however, but combining playwright with scrapy gives us a good solid robust scraping method to add to our repertoire.

JavaScript to Scroll to the bottom of the page:

Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases

# timestamps
00:00 Intro
01:49 Playwright & Selectolax
07:14 Playwright & Scrapy
Рекомендации по теме
Комментарии
Автор

Another great video! Thanks for showing both methods. 💯

stewart
Автор

Brilliant! love your attitude, admire an out-of-the-box thinker! Keep up the good work buddy!

tomermolnar
Автор

Great video content about webscrape. Your doing amazing bro.

malwaredev
Автор

You are a great tutor, and I suggest a video discussing and comparing all of these tools, why and when we could use them what is the best compo

great work keep making tutorials

abdelrhmanabbas
Автор

Hey, the video is really really helpful. Thank you very much for it! You are the go to channel for me whenever I wish to research on any topic related to web-scraping. You're doing a great job man!

Also, in the end of the video you said that this is not your preferred method for scrapping infinite scroll dynamic websites. So which one is your preferred method, which is also scalable?

devpala
Автор

This is so timely for me @John, as I was literally building a scraper yesterday to scrape a website that used XHR. Top content! Additonally, would it be possible for you to share the java script "code" that was used in the PageMethod function?

podcaste
Автор

Need you to make a nvim setup video because thats cool af

ishandandekar
Автор

can you please do a video on your neovim configuration

giftcp
Автор

Thanks for another great video!
This method seems so easy and wanted to try it myself but unfortunately, it seems that scrapy-playwright doesn't work on windows. Some sort of Linux emulation (WSL) is required.

Also thanks for the iproyal discount. I was looking for such a service and your discount comes just perfect, will use it after NYE party :)

PS: Everyone, a Happy new year!

StefanFlorescu-uruv
Автор

I'm literally just getting started with python and need a fast study done for my thesis so I decided to study word usage on reddit. Should go through with it? Idk if i need any special stuff :/ I don't even have python installed. Cheers <3

greyngreyer
Автор

Great video John, which Editor are you using ?

doodelinux
Автор

I watched the section between 4:30 and 5:00 (roughly) so many times.

The off-by-one space there was extremely distracting as well as satisfying when fixed.

Cheers

Drtsaga
Автор

It is playwirght faster and lighter than selenium?
Because i have knowlede about Selenium but i dont know if its worth to start acquiring knowledge abour scrapy and playwrigh. ¿Should i start?
Also, should i use playwright o splash? If i want to scrape data from dinamic and authenticated pages what should i use?
I would love and appreciate any information or advice. Thank you! Sorry if my english is not perfect

DwellingsOfficial
Автор

Hey, how can we scrape PDFs that are embedded to be viewed by chrome pdf preview? I think they use javascript.

vishalsugandh
Автор

As a newbe... Does anyone have some experience with a PUP – a command line tool for processing HTML? Is there any way to import it to the Playwright project the same way as the HTMLParser? Thanks.

_manasikara
Автор

instead of playwright, can we use splash for any projects? which are recommended for web scraping?

ervankurniawan
Автор

Hi John.. What options do you suggest if I have to save screenshot of webpage as jpeg or as html itself.. Is it possible to with Scrapy

vinubalank
Автор

Off topic, but will you do a video on websockets?

yBlade
Автор

This is nice, but my problem with using playwright is that it the twisted reactor always leads to issues when I want to run my spiders using python scripts

jeroenvermunt
Автор

How to run a script from playwright in jupyter notebook?

Kalter_int