Reliable, fully local RAG agents with LLaMA3.2-3b

preview_player
Показать описание
LLaMA3.2 has released a new set of compact models designed for on-device use cases, such as locally running assistants. Here, we show how LangGraph can enable these types of local assistant by building a multi-step RAG agent - this combines ideas from 3 advanced RAG papers (Adaptive RAG, Corrective RAG, and Self-RAG) into a single control flow using LangGraph. But we show LangGraph makes it possible to run a complex agent locally.

Code:

Llama3.2:

Full course on LangGraph:
Рекомендации по теме
Комментарии
Автор

Dude. You're the man. I've gone through most of your LangChain course and lots of the YT content. You're ... you have a knack for teaching.

christopherhartline
Автор

Awesome stuff. Langgraph is a nice framework. Stoked to build with it, working through the course now!

ytaccount
Автор

Great explanation, would be great to do one more tutorial using multimodal local RAG, considering the different chunks like tables, texts, and images, where you can use unstructured, Chroma, and MultiVectorRetriever completely locally.

homeandr
Автор

Amazing session and content explained very nicely in just 30 mins; Thanks so much

ravivarman
Автор

The tutorial was "fully local" up until the moment you introduced Tavily 😜😉.
Excellent tutorial Lance 👍

leonvanzyl
Автор

You are amazing, like always. Thank you for sharing

joxxen
Автор

Why did you use lama3.2:3b-instruct-fp16 instead of lama3.2:3b?

becavas
Автор

how to do these in .py files ? I mean since we are doing in jupyter, we can re-run the graph and re invoke. But when I am doing this in .py files, for each invocation, the whole graph is getting recreated and then getting compiled again and invoked.

asitnayak
Автор

Great Tutorial, so first of all: Thank you :-) But anyway i am not lucky with the results. Here is one example:
On a PDF which contains information on a filling machine i asked the question "How is the machine emptied":
The generated answer (simplified for this example) was "Step 1: do this, Step 2: do that, Step 4: do that".
And the answer grader decided that the answer is good. I expected that the answer grader should claim, that Step 3 is missing :-(

Автор

is it possible to make agent that when provide with few hundred links extracts info in all links and store it

developer-he
Автор

Question: You have operator.add on the loopstep, but tnen increment the loopstep in the state too… am i wrong in that it would then incorrect?

beowes
Автор

If different tools require different key word arguments, how can these be passed in for the agent to access?

sidnath
Автор

I'm a med student interested in experimenting with the following: I'd like to have several PDFs (entire medical books) from which I can ask a question and receive a factually accurate, contextually appropriate answer—thereby avoiding online searches. I understand this could potentially work using your method (omitting web searches), but am I correct in thinking this would require a resource-intensive, repeated search process?

For example, if I ask a question about heart failure, the model would need to sift through each book and chapter until it finds the relevant content. This would likely be time-consuming initially. However, if I then ask a different question, say on treating systemic infections, the model would go through the entire set of books and chapters again, rather than narrowing down based on previous findings.

Is there a way for the system to 'learn' where to locate information after several searches? Ideally, after numerous queries, it would be able to access the most relevant information efficiently without needing to reprocess the entire dataset each time—while maintaining factual accuracy and avoiding hallucinations.

marcogarciavanbijsterveld
Автор

Is there an elegant way to handle recursion errors?

hari
Автор

You make LLM to do all hard work for candidates filtering

hensonk
Автор

That's a great tutorial that shows the power of LangGraph. It's impressive you can now do this locally with decent results. Thank you!

SavvasMohito
Автор

Thanks it is indeed very cool. Last time you used 32Gb, do you think this will run with 16Gb? memory.

davesabra
Автор

Thanks for the video and sample putting all these parts together. What did you use to draw the diagram at the beginning of the video? Was it generated by a DSL/config?

AlexEllis
Автор

Great video. What tool did you use to illustrate the nodes and edges in your notebook?

Togowalla
Автор

Interesting, you basically use old school workflow to orchestrate the steps of LLM based atomic tasks. But what about to let the LLM to execute the workflow and also to perform all required atomic tasks? That would be more like agentic approach...

arekkusub