Web Scraping for LLM in 2024: Jina AI Reader API, Mendable Firecrawl, and Crawl4AI and More

Показать описание

In this video, we look into various tools for web scraping, both free and paid. Learn how to scrape data from web pages and PDFs using Beautiful Soup, Reader API from Jena AI, and Firecrawl from Mendable. We also discuss advanced web scraping solutions like Scrape Graph AI and Crawl4AI. Ideal for creating LLM applications, this video provides practical examples and code demonstrations. Subscribe for more tutorials on building LLM applications and tools!

#webscraping #llm #parsing

RAG Beyond Basics Course:

LINKS:

TIMESTAMPS
00:00 Introduction to Data Scraping Series
00:21 Challenges of Web Data
01:32 Overview of Web Scraping Tools
01:59 Example Web Pages for Scraping
03:05 BeautifulSoup: The Baseline Approach
05:05 Reader API: JINA AI
08:21 FireCrawl: An Alternative Tool
10:42 Crawl4Ai and ScrapeGraphAI
12:13 Conclusion and Next Steps

All Interesting Videos:

Рекомендации по теме

Комментарии

Thanks for mentioning ScrapeGraphAI, I'm one of the co-founders, we have implemented new features like code generator for scraping to minimize the number of calls to LLM on sites that have a shared structure on different pages, we are preparing something big related to KG, stay tuned :))))

lurensss

Thanks for mentioning Crawl4Ai! I'm adding some new features, such as extracting all media tags (video, image, audio), Breadth-First Search (BFS) Crawling, and more. I do it with the aim to generate quality data without relying on large language models (LLM). I think firing up GPUs for just crawling data from a page with billions of parameters is a bit over the top. Developers can use LLMs themselves once they have the right raw data from web sources.

unclecode

Yes PLEASE, Do a videos on {Crawl4Ai and ScrapeGraphAI}, and thank you for everything you do and your time 🙏

mjacfardk

I just use selenium web driver and JavaScript or Jquery to interact with and get the parts of pages I want. If they use cloud flare or other bot blocking you can run js in console and utilize the copy command then paste in a txt file

TimTruth

For jina reader Api key free for 1 million tokens which was 570 sites then pay 10 for 500 mil worth is 250k sites which is totally insane just pay the tiny amount for much better rate limits

jarad

Nice comparison! Please continue work on scraping for AI applications. Hot topic!

beemerrox

Thank you so much for sharing this valuable information. It is absolutely helpful.

ahassan

Great review. Please do a review on ScrapeGraphAI. Maybe a comparison to Uncle Code's Crawl4AI? I like Crawl4AI and hope UC incorporates PDF options.

GetzAI

Scrapegraph is pretty amazing, highly recommended

jcksn

Can you make a detailed video on scrapegraphai? It’s kinda buggy right now for me

AJ-lgzr

Thank you. If you could dive deeper into scrapegraph, specifically the knowledge graph feature.

SeeFoodDie

The android in the thumbnail looks like he's DJing. Like he's ready to drop a sick beat...NOW!

john_blues

I need this materials very much, , can you share codes and api brothe??

planetgamecommunity

Thank you so much for sharing this valubale information. It is absouletly helpful. But, is it possible, as far as jina ai is concerned, to specify in the code the number of pages that I want to scrape, as spmetimes the pdf file has more than 500 pages .

ahassan

Do any of these solutions work on sites you have to log in to? You can give them a url, but if the site requires you to log in, you will not be able to scrape further.

chuckcarlson

Probably a silly question, but in what is all this complicated proccess better than doing a simple copy paste from the url?

stefleur

Are there any scrapper available for LinkedIn and Instagram?

ppp

We must create order from the messiness! 😎🤖

thesimplicitylifestyle

Web Scraping for LLM in 2024: Jina AI Reader API, Mendable Firecrawl, and Crawl4AI and More

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

Web Scraping for LLM in 2024: Jina AI Reader API, Mendable Firecrawl, and Crawl4AI and More

This AI Agent can Scrape ANY WEBSITE!!!

Scrape any website with OpenAI Functions & LangChain

Industrial-scale Web Scraping with AI & Proxy Networks

Web Scraping with GPT-4 Vision AI + Puppeteer is Mind-Blowingly EASY!

Scrape Graph AI Setup Web Scraping Easy With LLM All Run Locally

Scrape Any Website with AI Locally and Free - ScrapeGraphAI

How I Used RAG with Llama 3.1 to Scrape & Summarize Google Trends Data | Streamlit Web App

Web scraping with Large Language Models (LLM)-AnthropicAI + LangChainAI

Web Scraping AI AGENT, that absolutely works 😍

“Wait, this Agent can Scrape ANYTHING?!” - Build universal web scraping agent

Python AI Web Scraper Tutorial - Use AI To Scrape ANYTHING

ScrapeGraphAI - REVOLUTION in WEB SCRAPING!!!

Scrape Any Website using llama3+Ollama+ScrapeGraphAI | Fully Local + Free #ai #llm

Am I going to jail for web scraping?

How To Use ChatGPT To Fully Automate Web Scraping

Build a RAG LLM Web Scraping API with BUN.js in 11 Minutes

Scrape ANY Website with AI For Free | Best AI Tools

How to Scrape and Extract Data with Langchain GPT Function Calling

Web scraping Using LLMs, AI Agent, and Crewai

AI Web Scraping Simplified For Everyone

This will change Web Scraping forever.

LLM-powered tool for web scraping #ai #chatgpt #engineering