Check Out This INSANE AI Web Scraper

preview_player
Показать описание
Watch full video here:

#shorts #ai #webscraping
Рекомендации по теме
Комментарии
Автор

For anyone wondering. This works with dynamic content, this isn’t just a get request fed into a parser. This uses a real browser, renders the content, scrapes the content, uses a semantic search of html tags, passes the pruned DOM to an LLM and uses the LLM to parse the content and format the result. So yes, it does actually use AI, the whole point of it is you don’t need to manually parse the DOM of a site, we just let the LLM do that for us.

TechWithTim
Автор

You did not build an ai nor a web scrapper. You made a wrapper that glues the two together

jack
Автор

I did the same as my 3rd year project without using the AI term. All in python btw and MySQL

aceentertainment
Автор

Cool! This could be implemented in ad blocks or extensions to filter all the fluff out of web pages

Harambe_
Автор

Post processing is awesome! People underestimate the power of 2 LLM's in series. I use whisper-1 to transcribe call-recordings, and GPT-4o to post process it and it is awesome!

reinhartmengelberg
Автор

i love you bro, share as much knowledge with other as u can within life on this channel ! you're already on right path much love TIM

AinaqEntertainment
Автор

Damm dude that's exactly what I need to build for a competition.

Will give it a shot soon

saurabhchaudhary
Автор

This one hits good. Such a good optimisation!

Автор

Behind every "pretty cool", there is an AI help taken.

xsdash
Автор

but won't there be a ton of errors if you're using an LLM? like in my experience they tend to just make stuff up, or get it wrong. Won't coding a more traditional robust web scrapper make sense?

d-rey
Автор

Amazing. I was thinking about how to do good web scraper. You make this for me. Thanks Tim!

aladinmovies
Автор

There's a filter feature on most sites.

hypedz
Автор

That requires no AI even though that's slapped here

WHIZBEEOO
Автор

Which llm model do you use for local inference. Which quantization and how many vram used. Because I've used llama3.1 8B on ollama and I've got not so good results on the same task. Only the paid models gave me good results on this task.

dibu
Автор

Do you need to get permission to scrape the page?

myWayIn
Автор

What tutorial? 😂

Step 1. Download the page
Step 2. Feed to LLM

Very cool idea, but future generation's programming skills are COOKED

MrDublem
Автор

when something is as easy to parse as that DOM content, wouldn't it be WAY cheaper to use python code instead of an LLM to parse that? I mean like 1000x cheaper?

MechPaul
Автор

Must be amazing for somebody who can only code with AI.

Marcin
Автор

Can someone please descrive what is happening here?

tejeshwar.p
Автор

This works on any website? false. Only on server-side rendered websites.

Try it on a SPA built with React for example.

CodeAbstract