Local RAG with llama.cpp

Показать описание

Learn Data with Mark

Рекомендации по теме

Комментарии

I can already tell from the first few seconds, this guy knows his stuff and explains it really well! Thank you

LondonSoundDimension

Thank you Mark! I was today years old when I learned my M2 mini/M3 air Macs indeed have the capacity for the Metal option. No wonder my queries drained the machine of all its RAM without generating a response anyway!
I haven't finished this one yet (had to pause to write down that Metal epiphany) but definitely going to watch more of your stuff as I'm just starting out making tech videos too. Love seeing what others do!

madsciai

Thanks again for another fantastic video! Quick question: Do you know the best way to format prompts when running the llama.cpp Server with the chat_format chatml parameter for RAG? I’m hosting and using OpenAI to create completions. Everything rans locally by me. My current setup has a system role that includes a system prompt and the relevant context, and a user role with only the query. However, sometimes the model just returns the system prompt instead of answering the question. Any ideas why that happens or how to fix it? Thanks a lot in advance!

Hi Mark, I don't understand why you're chunking the documents with your function "chunk"? Can we not just feed each 247 documents to the llm to create the embeddings? Like:
document_embeddings = llm.create_embedding( [ item.page_content for item in documents ] ).
We get the embedding back for each document and that's it. Am I missing something?
Are you doing it just to have 3 batches (100, 100 and 47) and embed them in parallel?

Sendero-ypgi

thank you Mark! great video! learnt a lot. may i ask one silly question. i can see you used a simpler emb model to extract the emb, but later on when you switch to llama3 emb model, i didn't understand why the same emb are reused rather than re-gen by the better and bigger llama3 model? does it mean that "search" is not the difficult part in RAG, but how to compile the final answer is?

MrWerewolf

Hi Mark, I can't find your chunk function on the github page u mentioned in description. Could u help me with that. Sorry am new to all this so might be a silly ask at the moment. Thanks a lot

MuhammadZubair-flwd

Hi Mark, great tutorial. I have been playing around a bit and tried to use my already existing ChromaDb as a retriever. Unfortunately simply changing the context to my retrieverDB did not work. I received "ValueError: Requested tokens (941) exceed context window of 512". Do you happen to know how to expand the context window or how to fix this?

inf-co

Hi Mark, thanks for sharing! How did you choose the embedding_llm? Is there a best practice or a guideline on how to choose it? I'm testing and I was wondering what I should use for embedding... any help would be greatly appreciated!

ElisaPiccin

Hi Mark, how quicker should inference be when setting n_gpu_layers = 1? I am on a Mac M1 pro with 16GB GPU, and if I set n_gpu_layers = 1 it is actually slower than not using it. Do you have an explanation for that or a way to check what is happening? Cheers!

Sendero-ypgi

Do you know what the "correct" way is to prompt for the GGUF? I have been using llama 3 through GGUF, Ollama and ChatOllama and I feel like the GGUF gives less lively answers than the Ollama Versions. Do you know why this is happening? Do I need to configure more or change the prompt?

inf-co

Local RAG with llama.cpp

Local RAG with llama.cpp

Fully local RAG agents with Llama 3.1

Reliable, fully local RAG agents with LLaMA3.2-3b

'I want Llama3 to perform 10x with my private knowledge' - Local Agentic RAG w/ llama3

GraphRAG with Llama.cpp Locally with Groq

Deploy Open LLMs with LLAMA-CPP Server

Python RAG Tutorial (with Local LLMs): AI For Your PDFs

Retrieval Augmented Generation (RAG) with LLAMA.CPP

Microsoft BitNet.cpp vs Llama.cpp : Run LLMs on CPU

Llama.cpp for FULL LOCAL Semantic Router

Real time RAG App using Llama 3.2 and Open Source Stack on CPU

Local RAG LLM with Ollama

All You Need To Know About Running LLMs Locally

Llama 3 RAG: Create Chat with PDF App using PhiData, Here is how..

I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

Running LLMs on a Mac with llama.cpp

Llama 3 8B: BIG Step for Local AI Agents! - Full Tutorial (Build Your Own Tools)

Structured JSON Output from LLM RAG on Local CPU [Weaviate, Llama.cpp, Haystack]

LlamaIndex 22: Llama 3.1 Local RAG using Ollama | Python | LlamaIndex

Quantize any LLM with GGUF and Llama.cpp

Llama-3 🦙 with LocalGPT: Chat with YOUR Documents in Private

Interroger un #ChatGPT local sur vos documents avec #langchain et #llama

Build a Medical RAG App using BioMistral, Qdrant, and Llama.cpp

Llama-CPP-Python: Step-by-step Guide to Run LLMs on Local Machine | Llama-2 | Mistral