“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

Показать описание

Build an universal Web Scraper for ecommerce sites in 5 min;

🔗 Links

⏱️ Timestamps
0:00 Intro
3:00 Challenges with web scraping
6:05 How LLM enable universal web scraper
10:51 Potential solutions
18:36 Solution 1: API based web agent - Researcher
25:81 Solution 2: Browser based agent - Universal ecommerce scraper

👋🏻 About Me

#agents #webscraping #scrapers #webagent #gpt5 #autogen #gpt4 #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #babyagi

Рекомендации по теме

Комментарии

It's interesting how much performance gain you got from clean markdown data like firecrawl, sometimes you dont need much stronger reasoning, you just need to give agent better tools

Joe-bpmo

I am already doing this. Its the same way I trained models to play video games - take a screensshot, convert to greyscale, but instead of inserting that into a CNN, I pipe it into an agent that I built and it has mouse and keyboard tools instead of the typical selenium/headless tools. It works pretty damn good although some models will refuse cpatchas outright.

agenticmark

Gonna try out the 2 examples soon, and please please launch the universal web scraping agent, i will pay you for that in a heartbeat!

jasonfinance

You talked about 'universal scrapers' then you used a bunch of expensive services to create a very vanilla hyper-specific scraper that doesn't' require LLMs at all.... hmm....

googleyoutubechannel

Holy shit, that universal ecommerce scraping agent in the end is sick, thanks for sharing that framework!!

Jim-eyry

Really Love his accent and voice, very soothing and clear

Chris_Faraday

perplexity should use this crawler since their models are hallucinating reference URLs LOL

amandamate

dear jason, i am really amateur with coding so i don't have a clue on so many topics that i try to execute. i have come across some of your interesting videos while trying to achieve but failed miserably on most of em. but today i just came for the thumbnail and rolling my sleeves to implement this masterpice. thank you so much & peace from 🇹🇷

nestpasunepipe

I am recently thinking about this idea too. Many thanks for sharing your result!!

elon-randgul

I never knew web scraping was so hard. I mean, I ve been trying to scrape together a decent Instagram following for years, but I guess that's not what they mean by web scraping.

Anyway, who knew websites were like the cool kids at school, only loading their content when you scroll into their 'cool zone' and making you jump through hoops to get to the good stuff

MechanicumMinds

We are in a world where data is the most sought after commodity. And AI is going to make accessing information trivial. I wonder how Big Business will respond. I suspect they'll start pushing for laws to criminalize web scraping in the not too distant future. It will be interesting to see how this all plays out in the years to come.

damionmurray

The cost per request for this must be through the roof!

danielcave

With all these expensive tools, I think it will best to build with playwright.

Though it will take weeks or months, but it will be cost effective.

AllenGodswill-imop

In movies they do all they can so the AI cannot access the internet, in real life : we need web scrapping man, give it access!

bernardthongvanh

Hi Jason, Your second example doesn't work. AgentQL doesn't open the amazon page.

brianchow-rglo

I don't believe it's possible to create a universal scraping solution that would be efficient in many edge cases. A custom solution would likely be faster and cheaper, especially if you need to scale.

I've evaluated a lot of scraping SaaS services and used everything from Selenium to headless browsers. There are so many protection mechanisms, including headers, API checks, cookies, etc., and I'm sure I haven't seen a fraction of them. Some sites even require the browser to load JS and render changes on screen.

With AI, we can get closer to an ideal solution. For example, you could take a screenshot if necessary (if the data is graphic and not part of the HTML source) and at the same time scrape the HTML. Then, pass them together to an LLM with your question. The structured data should then answer what you need it to become.

However, you need to run the LLM yourself. Any solution using an LLM should allow users to provide an extraction schema, which needs to be very flexible as a prompt. This could be a nice service for hobbyists, but for scale, it would be too expensive. A custom implementation would probably serve better.

syberkitten

I wonder if this is an Advertisement video or a knowledge sharing video..Nothing is open source.

yashsrivastava

The cost of making is comparatively so costly than creating a website specific scrapper and maintaining it.

justafreakable

10:42 i follow tutorial, build scraper with cleanmymac, nothing happen, install twice, Ubuntu 22.04 only get many index.html

kilianlindberg

Great work!! I'm currently tackling web scraping challenges, especially with certain sites where determining the delivery location or dealing with pop-ups obstructing the content poses issues. This often requires user action before the search query can proceed. What do you believe are the most effective methods or tools to overcome these hurdles? Sometimes, even the agentql struggle to resolve these issues.

eduardoribeiro

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

This AI Agent can Scrape ANY WEBSITE!!!

The Biggest Mistake Beginners Make When Web Scraping

Web Scraping AI AGENT, that absolutely works 😍

Advanced Web Scraping with Puppeteer: Avoid Looking Like a Bot and Pass Authentication!

User Agent Switching - Python Web Scraping

Always Check for the Hidden API when Web Scraping

Industrial-scale Web Scraping with AI & Proxy Networks

Making app with; claude dev, gpt engineer, lazy ai and replit agent

Scrape ANY Website with AI For Free | Best AI Tools

The easiest website SCRAPER of the year, Browse.ai is here to stomp APIFY

Scrape any website with OpenAI Functions & LangChain

Scrape Any Website with AI Locally and Free - ScrapeGraphAI

Autonomous AI agents and scraping for data gathering

Agent Stacking & Enhancing Data with Web Scraping and Make.com | No-Code Web Scraping

Web scraping with Large Language Models (LLM)-AnthropicAI + LangChainAI

How To Web Scrape Any Site With Make.com & AI

How to Scrape Websites Without Getting Blacklisted or Blocked

How to Set User Agent String while Web Scraping | Proxies API

How To Scrape Any Website

How I Scrape Everything with Apify & Make.com

Web Scraping with Python | Webscraping For Machine Learning

Render Dynamic Pages - Web Scraping Product Links with Python

5 Ways to Scrape Websites Without Getting Blocked