“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

preview_player
Показать описание
Build an universal Web Scraper for ecommerce sites in 5 min;

🔗 Links

⏱️ Timestamps
0:00 Intro
3:00 Challenges with web scraping
6:05 How LLM enable universal web scraper
10:51 Potential solutions
18:36 Solution 1: API based web agent - Researcher
25:81 Solution 2: Browser based agent - Universal ecommerce scraper

👋🏻 About Me

#agents #webscraping #scrapers #webagent #gpt5 #autogen #gpt4 #autogpt #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #bestaiagent #chatgpt #agentgpt #agent #babyagi
Рекомендации по теме
Комментарии
Автор

It's interesting how much performance gain you got from clean markdown data like firecrawl, sometimes you dont need much stronger reasoning, you just need to give agent better tools

Joe-bpmo
Автор

I am already doing this. Its the same way I trained models to play video games - take a screensshot, convert to greyscale, but instead of inserting that into a CNN, I pipe it into an agent that I built and it has mouse and keyboard tools instead of the typical selenium/headless tools. It works pretty damn good although some models will refuse cpatchas outright.

agenticmark
Автор

Gonna try out the 2 examples soon, and please please launch the universal web scraping agent, i will pay you for that in a heartbeat!

jasonfinance
Автор

You talked about 'universal scrapers' then you used a bunch of expensive services to create a very vanilla hyper-specific scraper that doesn't' require LLMs at all.... hmm....

googleyoutubechannel
Автор

Holy shit, that universal ecommerce scraping agent in the end is sick, thanks for sharing that framework!!

Jim-eyry
Автор

Really Love his accent and voice, very soothing and clear

Chris_Faraday
Автор

perplexity should use this crawler since their models are hallucinating reference URLs LOL

amandamate
Автор

dear jason, i am really amateur with coding so i don't have a clue on so many topics that i try to execute. i have come across some of your interesting videos while trying to achieve but failed miserably on most of em. but today i just came for the thumbnail and rolling my sleeves to implement this masterpice. thank you so much & peace from 🇹🇷

nestpasunepipe
Автор

I am recently thinking about this idea too. Many thanks for sharing your result!!

elon-randgul
Автор

I never knew web scraping was so hard. I mean, I ve been trying to scrape together a decent Instagram following for years, but I guess that's not what they mean by web scraping.

Anyway, who knew websites were like the cool kids at school, only loading their content when you scroll into their 'cool zone' and making you jump through hoops to get to the good stuff

MechanicumMinds
Автор

We are in a world where data is the most sought after commodity. And AI is going to make accessing information trivial. I wonder how Big Business will respond. I suspect they'll start pushing for laws to criminalize web scraping in the not too distant future. It will be interesting to see how this all plays out in the years to come.

damionmurray
Автор

The cost per request for this must be through the roof!

danielcave
Автор

With all these expensive tools, I think it will best to build with playwright.

Though it will take weeks or months, but it will be cost effective.

AllenGodswill-imop
Автор

In movies they do all they can so the AI cannot access the internet, in real life : we need web scrapping man, give it access!

bernardthongvanh
Автор

Hi Jason, Your second example doesn't work. AgentQL doesn't open the amazon page.

brianchow-rglo
Автор

I don't believe it's possible to create a universal scraping solution that would be efficient in many edge cases. A custom solution would likely be faster and cheaper, especially if you need to scale.

I've evaluated a lot of scraping SaaS services and used everything from Selenium to headless browsers. There are so many protection mechanisms, including headers, API checks, cookies, etc., and I'm sure I haven't seen a fraction of them. Some sites even require the browser to load JS and render changes on screen.

With AI, we can get closer to an ideal solution. For example, you could take a screenshot if necessary (if the data is graphic and not part of the HTML source) and at the same time scrape the HTML. Then, pass them together to an LLM with your question. The structured data should then answer what you need it to become.

However, you need to run the LLM yourself. Any solution using an LLM should allow users to provide an extraction schema, which needs to be very flexible as a prompt. This could be a nice service for hobbyists, but for scale, it would be too expensive. A custom implementation would probably serve better.

syberkitten
Автор

I wonder if this is an Advertisement video or a knowledge sharing video..Nothing is open source.

yashsrivastava
Автор

The cost of making is comparatively so costly than creating a website specific scrapper and maintaining it.

justafreakable
Автор

10:42 i follow tutorial, build scraper with cleanmymac, nothing happen, install twice, Ubuntu 22.04 only get many index.html

kilianlindberg
Автор

Great work!! I'm currently tackling web scraping challenges, especially with certain sites where determining the delivery location or dealing with pop-ups obstructing the content poses issues. This often requires user action before the search query can proceed. What do you believe are the most effective methods or tools to overcome these hurdles? Sometimes, even the agentql struggle to resolve these issues.

eduardoribeiro