Web Scraping with Python - Get URLs, Extract Data

Показать описание

This is the third video in the series of scraping data for beginners. We're going to add functionality to scrape from the actual product pages rather than just the search page. Adding in dataclasses will also help us handle our data.

This is a series so make sure you subscribe to get the remaining episodes as they are released!

If you are new, welcome! I am John, a self taught Python (and Go, kinda..) developer working in the web and data space. I specialize in data extraction and JSON web API's both server and client. If you like programming and web content as much as I do, you can subscribe for weekly content.

:: Links ::

:: Disclaimer ::
Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.

John Watson Rooney

Рекомендации по теме

Комментарии

John, you've made me re-enjoy scraping. I gave up due to how frustrating most tutorials are and the lack of real-world application with all of those stupid scraping demo sites. Thanks for all you do man

x_nietoh

Excellent video, great learning experience

eduardop

Hi, thanks a lot the video is super clear and rich, I'm about to apply it on a similar website to grab details on products

charlottegauthier

Excellent video series, much appreciated. Thank you for posting.

daveys

Another great presentation! Neat use of kwargs. Also, a very relevant use of data classes.

thebuggser

thank you! we need more of this sh!t
and i hope a serie like this of BeatifulSoup either

Lorem

"parse_page(html)" from lesson 2 suddenly became "parse_search_page(html: HTMLParser):" in lesson 3 without any explanation. Anyway great tutorial as well as a whole series. Very good for beginners.

Mac_Edits

you are genius man, thank you very much

abdifatahabdi

This is very helpful! I appreciate it a lot.

milyastroc

if we can combine playwright with this, then basically we can scrape any dynamic sites? (e.g: social media websites)
thank you so much John this series is very fulfilling.

AliceShisori

Hi kindly make a video of python with Selenium because no updated chrome driver available so I don't know how we run script now.
Thanks

muhammadhaddid

Good series! Personally I think the yield is a nice touch but probably not needed here based on the weight of the script (and the generator itself doesn't help iteration as was described as the reason for its inclusion), the dataclass is overkill vs a dict (we end up converting out to dict anyway), and so is **kwargs vs a single kwarg that defaults to something like False or None (gives an impression there may be more than a single kwarg, easier just to use a single one that defaults to a value when not passed in). Got a subscribe from me, thank you :)

darylkell

Man, your videos are great. Your videos on playwirght have really been helpful. I was able to follow your videos and then make my own playwirhgt script in my project. Until I got stuck dealing with dynamic pop-ups. I am unable to get past those. I am supposed to enter a piece of data in those pop-ups (not like captcha stuff). Just unable to make it work. It would help if you could cover dealing with dynamic pop-ups. Thanks.

KushalSharmatheOne

From this video is not understandible for beginners, untill you decided for some reason to change all the code

rovolqg

Great video! Question: How can I find the extension that provides you with the errors next to the code?

juampivitalevi

Also kindly add the product urls column for each product and make it clickable when writing to CSV

jaswanth

Great video! You've got a subscriber. After trying out the code a couple of times, I came across ReadTimeout error. How do we fix that?

abhin.v

Based on one of your previous videos figured out, how to get nested objects from tricky div's . Thank you!
Could you please advise, how in function below do I get not only <p>'s but also <h2>'s, <pre>'s and <ul><li>'s elements?
Should it be some sort of pipe like syntax "div.article-formatted-body > div > p | h2 | pre | ul | li |"?

def read_article(html):
article_body = > div > p")
paragraphs = [i.text() for i in article_body]
print(*paragraphs, sep='\n')

samoylov

Shouldn't item number an integer and price being float?

atatekeli

Nice job is there a way to put this whole stuff in a cron job or scheduler to run intermittently

acharafranklyn

Web Scraping with Python - Get URLs, Extract Data

Beginners Guide To Web Scraping with Python - All You Need To Know

Scraping Data from a Real Website | Web Scraping in Python

Web Scraping with Python - Beautiful Soup Crash Course

Python Tutorial: Web Scraping with BeautifulSoup and Requests

Scrapy Course – Python Web Scraping for Beginners

Beautiful Soup 4 Tutorial #1 - Web Scraping With Python

BeautifulSoup + Requests | Web Scraping in Python

Web Scraping With Python 101

Web Scraping in Python | Code Horizons #coding #python #programming #webscraping

Python AI Web Scraper Tutorial - Use AI To Scrape ANYTHING

Web Scraping with Python and BeautifulSoup is THIS easy!

Web Scraping with Python - Start HERE

Python: Einfacher Web-Scraper | Tutorial für Anfängerinnen | (Beispiel 2, Deutsch)

Ultimate Guide To Web Scraping - Node.js & Python (Puppeteer & Beautiful Soup)

Web Scraping Using Python | GeeksforGeeks

Amazon Web Scraping Using Python | Data Analyst Portfolio Project

Easy Web Scraping With BeautifulSoup and Python | Tutorial

Scrape Amazon Data using Python (Step by Step Guide)

Advanced Web Scraping Tutorial! (w/ Python Beautiful Soup Library)

Web Scraping 101: A Million Dollar Project Idea

Web Scraping Tutorial Using Python | BeautifulSoup Tutorial 🔥

Let's Build a Python Web Scraping Project from Scratch | Hands-On Tutorial

🐍 Curso de WEB SCRAPING con PYTHON para PRINCIPIANTES

Comprehensive Python Beautiful Soup Web Scraping Tutorial! (find/find_all, css select, scrape table)