Gemma 2 - Local RAG with Ollama and LangChain

Показать описание

In this video I go through setting up a basic fully local RAG system with Ollama 2 and the new Gemma 2 model.

Interested in building LLM Agents? Fill out the form below

👨‍💻Github:

⏱️Time Stamps:
00:00 Intro
00:09 Ollama: Gemma 2 Model
01:35 Demo: RAG running locally

Рекомендации по теме

Комментарии

Thank you for the video. Vote for the next video - Fully Local Multimodal RAG

relaniumz

Just wanted to give a big thumbs up to this, although I haven't yet watched the whole thing 😀. There are so many interesting things you can do with Local RAG and LangChain is very straight forward. I did something similar with Ollama's Llama3 model. Very interested in trying new Llama models that should be available soon.

toadlguy

useful videos. keep on uploading and make aussies proud of you

mrnakomoto

memory, LangChain agents, streaming UI next, please. Thanks for the very useful video!

aa-xnhc

Amazing tutorial, exactly what I was looking for! Running it with a few text documents, the results are great. Do you have any recommendations for making the QA faster? A different model or libraries?

hrk

LangChain and lamaIndex is really is boilerplate imho, they create more problems then they solve with their over abstraction, can you show vanilla example of how to do rag ?

flat-line

Thanks for showing gemma2 and ollama. Would be nice to see with mesop. Maybe in combination with langsmith for debugging?

henkhbit

Great video again. Question though. I can see you are more in favour of langchain but what’s your thoughts on autogen and Teachable agents to do something similar? And in general I suppose your thoughts on autogen and its agentic model?

emmanuelauffray

Hey Sam! For now Gemma2 is still broken in Ollama, which doesn't include yet the latest llama.cpp fixes required.
It's about the tokenizer: <start_of_turn> and <end_of_turn> are interpreted as text instead of special tokens, and of course things don't really work as expected as a result.

I believe it'll be fixed in the next Ollama update tho - very soon. But it's too early for Gemma 2 evaluations using Ollama at the moment, like many are making on their own or publishing in videos.

supercurioTube

Is it possible to run the embedding model on cpu and the localllm on gpu?

matthewpublikum

i see that it is working quite fast on Mac Mini.
But what are RAM requirements for model and chroma? Does it require GPU for acceptable performance?

You've mentioned that choice of embedder is important. As I understand the same vector dimensionality is not required, since you use embeddings only during embedding process and vector search. But what about "semantic" compatibility between embedder and LLM? I can imagine that embedder could map semantic meaning in its vector space differently from Gemma or LLama. Is it even possible to compare to ensure that you use the best possible embedder for some model?

SwapperTheFirst

Hey sam can you explain why does your prompt template always seem in a different structure? By that I mean in this case you wrote at the start <bos><start_of_turn>user\n then towrads the end you wrote <end_of_turn>. Does each llm have its own way of writing its own prompt template? If so, what & where do you refer to when you want to do prompt engineering for an llm ur using?

yazanrisheh

I find chroma is not very suitable for local RAG. It sends back telemetry data to their devs. One needs to set anonymized_telemetry=False to keep it quiet. Also, running ollama with some of the tools mentioned behind a firewall/proxy can be a challenge.

PestOnYT

Hello, i'm french, sorry for translate, really good job, i have a question, how do you add pfd to txt on the top of your code ? **/*.txt, *.pdf or anything ? thank you

bluelegend

how does this compare to ms' recently opensourced graphrag? btw there are graphrag w/ ollama implementation tutorials (2 diff versions to do it, 1 is a "hack" / req graphrag python lib change to make it work w/ ollama, other one req lm studio)... with 2 types of querying: "global", which works fine, always; "local", which often / usually fails (w/ various error msgs / for various reasons)

themaxgo

What are the system requirements?
Do we need a GPU with certain size of VRAM?

ShravanKumar

it doesn't work very well, but it is informative.

HmzaY

Can you share the index code? I do not see it in Github

matthewchung

Thanks for sharing your experience.
I want to run this model on my computer. So I wrote Modelfile like below:

FROM gemma-2-9b-it-Q6_K_L.gguf

TEMPLATE """
<start_of_turn>user:
{{prompt}}<end_of_turn>
<start_of_turn>model:
"""

PARAMETER stop <end_of_turn>

And I create model to ollama, so I ran this command

ollama create ollama create gemma-2-9b-it-Q6_K_L -f

And I want to run this model, so I ran this command

ollama run gemma-2-9b-it-Q6_K_L:latest

Finally, I got an error message....

Error: llama runner process has terminated: signal: aborted (core dumped)

How could you run this model on ollama?
Thank you.

kungmo

Gemma 2 - Local RAG with Ollama and LangChain

Gemma 2 - Local RAG with Ollama and LangChain

Building RAG with Gemma 2

Gemma 2: How to chat with your PDFs using LLMs+Ollama| FREE COLAB| Ollama RAG|FULLY LOCAL #ai

Introducing Gemma 2 for developers and researchers

Gemma 2 - RAG with LangChain and HuggingFace and Groq (text ,pdfs)

How to Implement RAG with Gemma Model on Your Documents: A Step-by-Step Local Setup Guide

Obsidian AI - Google's Gemma 2B Model Local LLM Test

This AI Model is Seriously Underrated

End To End Document Q&A RAG App With Gemma And Groq API

Get started Gemma 2 Locally on Mac using MLX

Local Gemma - Run Gemma-2 Locally in Python, Fast

Google Gemma Fully LOCAL RAG ChatBot using Ollama|LangChain|Chainlit|Chat with Docs #ai #ollama #llm

Get started Gemma 2 Locally on Mac using MLX

Google Gemma-2: Technical Report Deep Dive

Meet Gemma: Google's New Open-source AI Model- Step By Step FineTuning With Google Gemma With L...

Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem)

How To Build A RAG System using Google's Gemma Open Model

Get Embeddings from Gemma 2

[LLM News] ESM3, CriticGPT, Gemma 2, LLM Compiler, LongRAG, GraphReader

Create fine-tuned models with NO-CODE for Ollama & LMStudio!

Running LLMs on Laptop | Open Web UI for local ChatGPT like UI | Tools & Techniques - Edition 4

Google releases Gemma 2 and it's IMPRESSIVE!

AI prakticky - Gemma 2 27B - Jak spustit nový open source LLM model od Googlu a použít ho prakticky?...

How Bad is Gemma Compared to Mistral?