Scrape Any Website using llama3+Ollama+ScrapeGraphAI | Fully Local + Free #ai #llm

Показать описание

In a constantly evolving web landscape, ScrapeGraphAI introduces a new era of web scraping. This open-source library leverages Large Language Models (LLMs) to offer flexible and low-maintenance scraping solutions for developers.

ScrapeGraphAI is a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.). The SmartScraperGraph class represents one of the default scraping pipelines, utilizing a direct graph implementation where each node has its own function—from retrieving HTML from a website to extracting relevant information based on your query and generating a coherent answer.

In this video, we explore how to scrape website content using LLaMA3, Ollama, and ScrapeGraphAI, all running locally. Note that a minimum of 15GB of RAM is required for this application. This approach can also be applied to your local documents such as XML, HTML, and more.

Let's dive into it!

#WebScraping #PythonLibrary #ScrapeGraphAI #LLM #OpenSource #Tutorial #DataExtraction #LocalDocuments

LINKS:

Рекомендации по теме

Комментарии

First!! You're an absolute hero for sharing this

waynedayata

doesn't work on mac (you should say that your vid is for windows) - error when trying to install requirements: "AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?"

themaxgo

How can you go into a detail page of a list ? Sometimes an url that returns a list doesn't have all the details and you need to click on 'more' or something . Is there a way to do this ?

georgeluyckx

is there any way to increase token limits? some sites after scraping are more than 1024 tokens and I can't seem to pass any parameter to change that in graph config. Tried different models that suppose to have more than 1024 tokens too. Is that some sort of scrapegraph limitation maybe?

derpnaifu

Due I love your VSCODE theme, is Amazing. Do you can tell me his name?. Great Vídeo

RaulAlmao-yvqx

"module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'? ", getting this error

vishalranjan

I am getting error while running the requirement.txt file
ERROR: Could not find a version that satisfies the requirement playwright (from versions:none)

albaky

Short, maybe dumb, question: is it possible to scrape complete sites.So following links also and scraping the content?

danielniehoerster

For me it was structed in processing chunks 0%. Not getting output how time it took to run?

ponsekhagurusamy

Hello Sir, thanks for the guide. I can run it perfectly. But how to make the result log save to local folder automatically?

wonghector

After running app.py, I noticed a dramatic reduction in my hard disk space. Is it possible to regain the memory that the program used and return to normal disk space? thank you

Ashort

If the page url behind the company firewall, how could I do it?
For example, the coompany confluence page, can I scrape it via Confluence API using this tool?

alecd

Trying to follow in Mac Pro M1 and getting the error below. Please anyone kindly advise. Many thanks!
Traceback (most recent call last):
File "/Users/user/scapegraphai/app.py", line 32, in <module>
result = smart_scraper_graph.run()

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 109, in run
self.final_state, self.execution_info = self.graph.execute(inputs)

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 106, in execute
result = current_node.execute(state)

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/scrapegraphai/nodes/rag_node.py", line 89, in execute
retriever = FAISS.from_documents(

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 550, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 930, in from_texts
embeddings =

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 211, in embed_documents
embeddings =

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in _embed
return for prompt in iter_]

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in <listcomp>
return for prompt in iter_]

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 173, in _process_emb_response
raise ValueError(
ValueError: Error raised by inference API HTTP code: 500, {"error":"[0] server cpu not listed in available servers map[]"}

AC-gotp

is there anyway to pass in multiple urls to scrape?

madhudson

Scrape Any Website using llama3+Ollama+ScrapeGraphAI | Fully Local + Free #ai #llm

Scrape Any Website using llama3+Ollama+ScrapeGraphAI | Fully Local + Free #ai #llm

Scrape Graph AI Setup Web Scraping Easy With LLM All Run Locally

This AI Agent can Scrape ANY WEBSITE!!!

How to scrape the web for LLM in 2024: Jina AI (Reader API), Mendable (firecrawl) and Scrapegraph-ai

Build a Web Scraping AI Agent with Llama 3.2 Running Locally

Python AI Web Scraper Tutorial - Use AI To Scrape ANYTHING

Build Anything with Llama 3 Agents, Here’s How

Web Scraping for LLM in 2024: Jina AI Reader API, Mendable Firecrawl, and Crawl4AI and More

Yeah but can it RUN LOCALLY?

Industrial-scale Web Scraping with AI & Proxy Networks

Discover the Hottest AI & LLM Projects: Unveiling Pykan, EfficientViT & More!