Scraping with Playwright 101 - Easy Mode

preview_player
Показать описание
Playwright is an incredible versatile tool for browser automation, and in this video I run thorugh a simple project to get you up and running scraping data with PW & Python

If you are new, welcome! I am John, a self taught Python developer working in the web and data space. I specialize in data extraction and JSON web API's both server and client. If you like programming and web content as much as I do, you can subscribe for weekly content.

:: Links ::

:: Disclaimer ::
Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.

:: Chapters ::
00:00 - checking site
02:18 - start code
05:24 - detail page
11:13 - pagination
16:30 - summary and run
Рекомендации по теме
Комментарии
Автор

Great walk-through with solid tips! Thank you for sharing this.

allenbrokeit
Автор

This is awesome!!
As an API Security Specialist, I always start by looking at the HTTP calls, searching for an API call that might have that same info. Saving me time from scraping the page. Most of the time I’m having success with that approach, especially when dealing with solid companies/websites/platforms.

bigoper
Автор

thank you for your content. I'm just a beginer and it's really helping

StanHordon
Автор

Great one!
I think that using pytest-playwright package can save several lines of code in the initialization part, because you can just use the page:Page fixture

alexanderkomanov
Автор

Good content as always. Enjoy your Easter break 😉👍

graczew
Автор

waaaay, i just found schema on another websites, nice trick anyway, but i find it more efficient to read the info from the category pages. Thanks for your videos, they always inspire me!!!

Extrey
Автор

Hey John, can you please continue the scraping livestream with your test site? 😃
Would love to see how to handle the drop-down menus, Java script and how to handle stricter cloudflare rules
Would be happy to hear about some news! Enjoy easter :)

fredde
Автор

Thank you John for the teaching. I seem to have issue with Xvfb for running 'headless'. Any suggestion or resources that I can learn from?

elu
Автор

really well explained! is there a way to run the loop in the original browser? say if were only interested in the first page of the pagination and the products on only page 1.

donaldandmijung
Автор

Thank you John, I've been really enjoying your videos recently and applying everything at work where it comes in really handy. Would you consider creating a python/scraping course on Udemy or a similar platform?

carloiurcovici
Автор

can you please start talking about some difficult cases :

- scraping a website that has cloudflare protection against bots (even using proxy rotation it didn't work)
- scraping website that have captchas protection
..

Thank you

badrenanna
Автор

I'm following this exact code in VSCode and only the initial web is opened, it doesn't open the subsequent pages that direct to each of the product, no idea how to fix this...

wuipmpz
Автор

Would've been nice to have the URL and selectors in the description for us to copy and paste into our code instead of typing the ridiculously long strings.

dontwanttojoingoogle
Автор

Thanks john, but now days most websites don't allow you to open links like you do they will block you after 3 or 4 pages open in same time


another question If you can make a video on how we can use playwright inside a docker with proxy to make many requests at same time it will be very nice


sorry for my English, I'm not a native speaker

alexdin
Автор

sir can you make a video how to deploy playwright script on google cloud function / vpc please

IshaqKhan
Автор

Can’t you just do viewpoint for setting a screen size and header and run it headless with no issue

syx
Автор

Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.

Sir-Ahmad-Khan
join shbcf.ru