Fully local RAG agents with Llama 3.1

preview_player
Показать описание
With the release of Llama3.1, it's increasingly possible to build agents that run reliably and locally (e.g., on your laptop). Here, we show to how build reliable local agents using LangGraph and Llama3.1-8b from scratch. We build a simple corrective RAG agent w/ Llama3.1-8b, and compare its performance to larger models llama3-70b, gpt4-o. We test our Llama3.1-8b agent on a corrective RAG challenge, and show performance and latency versus a few competing models. On our small / toy challenge, Llama3.1-8b performs on par w/ much larger models w/ only slightly increased latency. Overall, Llama3.1-8b model is a strong option for local execution and pairs well with LangGraph to implement agentic workflows .

Blog post:

Ollama:

Code:
Рекомендации по теме
Комментарии
Автор

The link to code on GitHub is broken. Can you please fix it?

MansA-nl
Автор

I think there is a missing step in the rag flow. If the user knows he is "talking" to some documents, he might prompt "What is the summary?". In this case, the grader will always answer "No", and the web search will be useless. There has to be an additional step to evaluate the question - if it is similar as the one I just mentioned, then you would simply fetch a block of text from those documents and send to the LLM to summarize.

I have a bert classifier built specially for this on HF:

cnmoro
Автор

Llama likely will make its way into online Ai products (it already does). But until someone builds a one click Llama download and install, the general public will likely never run a local Ai. And they will certainly never jump deep into coding just to build simple agents. It is just way over the heads of most general computer users. And if one click instal is not done soon, people will gravitate towards the online subscription proprietary Ai offerings (OpenAi, Claude, Gemini, etc) and never look back.
I think that is what really killed Linux in competing in the OS space. Mac and Windows, even in the 90's was basically a one click installation process whereas Linux used command line, bin this bash that and installation was cumbersome and not easy....I know because I did it in the mid 90's. People will go for what is easy (Mac, Windows, an online Ai model) and once they are hooked into a particular Ai model, it will be darn hard to get them to change.
It's a real shame because Llama is a pretty terrific LLM but local installation is just a nightmare for the majority of the general computing public.

dbreardon
Автор

IMAGE EMBEDINGS NOT WORKING - TEXT FINE - BRO LLAVA RAG MULTIMODAL PLS

antonpictures
Автор

few thing have to change:

1. use :
from langchain_ollama import OllamaEmbeddings # from langchain_nomic.embeddings import NomicEmbeddings
....
embedding=OllamaEmbeddings(model='nomic-embed-text'),

MohammedAlshayeb-rm
Автор

Is it just me or is the code not accessible?

qactus
Автор

Can I run 405B LLMs with 8gb of ram? 🤣

farnsworth
Автор

I really like the langmith test section in the package. Great job!

nachoeigu
Автор

I got an error info as " ValueError: Node `retrieve` is not reachable" while running the example code.
Who can help me to figure out what happened?

kaneyxx
Автор

Why always openai embedding? Why not used faiss and open source one instead 😊

DhirajPatra
Автор

Thanks so much for this open information ❤

chukwuinnocent
Автор

Thanks for your great video. Could you recommend llama3.1 also for RAG based on documents in german language? All the time when we tried this, the results were much worse compared to using LLMs like gpt-4o or gpt-4o-mini. And could you explain why you are using the OpenAI embeddings? If I want to use this demo as a RAG app for asking questions to local document, do I only have to replace the WebBaseLoader by a DocumentLoader?

uwegenosdude
Автор

Thank you for sharing this informative video and its hands-on code. I've faced this connection error="Error running target function: [WinError 10061] No connection could be made because the target machine actively refused it" can anyone please guide me how to fix it?

ElaheKhatibi-qj
Автор

Any chance you might compare hosted api Llama 3.1 405b and Mistral Large 2 123b on same evals?

aaagaming
Автор

I find the fascination with parameter numbers boring. What I would like is a way to measure how much data a model can *hold*. Is there anything like that out there?

malikrumi
Автор

I would like to see a sample of how to use this to elaborate large size text that follow a structure or script... without losing coherence by re-evaluating the progress

alitomix
Автор

OOhh nice! I had some issues with llama3-groq-tool-use even after pulling with ollama and trying, it kept returning an empty list instead of the actual tool calls. Just tested this code though and it works great! Love it! Thanks!!! Love the videos from the channel!

automatalearninglab
Автор

thanks for this! Now get off the toilet and go put some clothes on

Ronaldograxa
Автор

Nice. I need to test this as well on more complicated agents setup, I had a case were some models would not complete, run into loops, having too many errors trying to call tools ... have to give 3.1 a go at it.

jwickerszh
Автор

I really like the way you explain it, makes it easy to learn the concepts.

aaagaming