Scrape Any Website using llama3+Ollama+ScrapeGraphAI | Fully Local + Free #ai #llm

preview_player
Показать описание
In a constantly evolving web landscape, ScrapeGraphAI introduces a new era of web scraping. This open-source library leverages Large Language Models (LLMs) to offer flexible and low-maintenance scraping solutions for developers.

ScrapeGraphAI is a web scraping Python library that uses LLM and direct graph logic to create scraping pipelines for websites and local documents (XML, HTML, JSON, etc.). The SmartScraperGraph class represents one of the default scraping pipelines, utilizing a direct graph implementation where each node has its own function—from retrieving HTML from a website to extracting relevant information based on your query and generating a coherent answer.

In this video, we explore how to scrape website content using LLaMA3, Ollama, and ScrapeGraphAI, all running locally. Note that a minimum of 15GB of RAM is required for this application. This approach can also be applied to your local documents such as XML, HTML, and more.

Let's dive into it!

#WebScraping #PythonLibrary #ScrapeGraphAI #LLM #OpenSource #Tutorial #DataExtraction #LocalDocuments

LINKS:
Рекомендации по теме
Комментарии
Автор

First!! You're an absolute hero for sharing this

waynedayata
Автор

doesn't work on mac (you should say that your vid is for windows) - error when trying to install requirements: "AttributeError: module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'?"

themaxgo
Автор

How can you go into a detail page of a list ? Sometimes an url that returns a list doesn't have all the details and you need to click on 'more' or something . Is there a way to do this ?

georgeluyckx
Автор

is there any way to increase token limits? some sites after scraping are more than 1024 tokens and I can't seem to pass any parameter to change that in graph config. Tried different models that suppose to have more than 1024 tokens too. Is that some sort of scrapegraph limitation maybe?

derpnaifu
Автор

Due I love your VSCODE theme, is Amazing. Do you can tell me his name?. Great Vídeo

RaulAlmao-yvqx
Автор

"module 'pkgutil' has no attribute 'ImpImporter'. Did you mean: 'zipimporter'? ", getting this error

vishalranjan
Автор

I am getting error while running the requirement.txt file
ERROR: Could not find a version that satisfies the requirement playwright (from versions:none)

albaky
Автор

Short, maybe dumb, question: is it possible to scrape complete sites.So following links also and scraping the content?

danielniehoerster
Автор

For me it was structed in processing chunks 0%. Not getting output how time it took to run?

ponsekhagurusamy
Автор

Hello Sir, thanks for the guide. I can run it perfectly. But how to make the result log save to local folder automatically?

wonghector
Автор

After running app.py, I noticed a dramatic reduction in my hard disk space. Is it possible to regain the memory that the program used and return to normal disk space? thank you

Ashort
Автор

If the page url behind the company firewall, how could I do it?
For example, the coompany confluence page, can I scrape it via Confluence API using this tool?

alecd
Автор

Trying to follow in Mac Pro M1 and getting the error below. Please anyone kindly advise. Many thanks!
Traceback (most recent call last):
File "/Users/user/scapegraphai/app.py", line 32, in <module>
result = smart_scraper_graph.run()

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/scrapegraphai/graphs/smart_scraper_graph.py", line 109, in run
self.final_state, self.execution_info = self.graph.execute(inputs)

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/scrapegraphai/graphs/base_graph.py", line 106, in execute
result = current_node.execute(state)

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/scrapegraphai/nodes/rag_node.py", line 89, in execute
retriever = FAISS.from_documents(

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_core/vectorstores.py", line 550, in from_documents
return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/vectorstores/faiss.py", line 930, in from_texts
embeddings =

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 211, in embed_documents
embeddings =

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in _embed
return for prompt in iter_]

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 199, in <listcomp>
return for prompt in iter_]

File "/Users/user/scapegraphai/sga/lib/python3.11/site-packages/langchain_community/embeddings/ollama.py", line 173, in _process_emb_response
raise ValueError(
ValueError: Error raised by inference API HTTP code: 500, {"error":"[0] server cpu not listed in available servers map[]"}

AC-gotp
Автор

is there anyway to pass in multiple urls to scrape?

madhudson