Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem)

Показать описание

In this video, I show how to serve your open-source LLM and Embedding model on-prem for designing a Retrieval Augmented Generation (RAG) chatbot. For this purpose, I take RAG-GPT chatbot and instead of the GPT model, I use *Google Gemma 7B* as the LLM and instead of text-embedding-ada-002, I use *baai/bge-large-en* from Huggingface. I use Flask to develop a web server that will serve the LLM for real-time inferencing and I show you how to use Postman to develop these types of projects.

00:00 intro
01:22 Demo
03:12 RAG-GPT schema
04:39 RAG-Gemma schema
05:19 Challenges of open-source LLMa for chatbots
7:15 A possible solution for serving LLMs on-prem
08:02 Lost in the middle (A tip for context-length)
09:26 Project structure
11:18 How to load and interact with Gemma
13:06 Developing LLM web server with Flask
20:38 Testing and debugging the LLM web server with Postman
25:00 Testing the RAG chatbot
27:22 GPU usage of this chatbot
27:48 Do we need a web server for the embedding model?

🎓 *Models that are used in this chatbot:*

📚 *Extra Resources:*

#opensource #llm #rag #chatbot #huggingface #google #gemma #gradio #langchain #Flask #python #GUI #postman

AI RoundTable

Рекомендации по теме

Комментарии

I have a question. So once you start using this app and start uploading the documents additionally with the 'upload PDF or doc file' button, do the documents stay in the app data and you can use them whenever you want, or they will be deleted? (sorry if i got the answer in the video, i probably missed it.)

NinVibe

Just another fantastic video! Thanks Farzad

omidsa

Hi. Thanks for your detail & step-by-step with your codes. Am enjoying and learning same time. 🙏

ginisksam

Great stuff. Instead of using pdfs, can you do a tutorial on using a large csv doc for RAG?

horyekhunley

Thanks for your efforts. I really like your explanation style. I have a question that really kills me :) How to control the LLM response and be assured that it will be from the RAG? are there any specific techniques for that? Also, How to provide feedback about the response to the LLM? I can see you are showing thumbs up and down here where is this feedback saved? and How to inform the LLM by this feedback? I am sorry if my question is primitive (may be for others) but for me it is very important to understand these questions. Thank you again

zfurlho

This is great but I think it would be better if you did your projects on colab

tk-ttbw

this is awesome! thank you! One thing though - by now only lazy did not say that gemma is a poor model. Many experiments on yt show its underperpormance. How about make a use of any open llm? Can you experiment with LM Studio?

Mike-Denver

21:22 Where did you get that url, the POST, I don't see any url like that in the output

KhanhLe-puwx

I appreciate this example of a local LLM RAG deployment as well as your attention to detail. I wish every presenter was this thorough!

I have the Gradio UI working for queries against the documents which I did on the manual ingestion step. However, when I attempt to upload PDFs from the interface I get an "Error" message on the screen. When I look at the app.py terminal, the error is: "TypeError: isfile: path should be string, bytes, os.PathLike or integer, not _TemporaryFileWrapper"

Any idea what might be going wrong with the filepath and what needs to be changed?

fyi I had to set Gradio to version 3.48.0 to over come the error: "cannot import name 'RootModel' from 'pydantic'" and perhaps this caused the problem above?

doctorbill

hi farzad, how are you ?
so im testing RAG platforms and currently im using privategpt+ollama { mistral instruct fp16 v0..2} + bge-m3 + bge-reranker-large
i have a 3090
the inference is superfast, but the results are not satisfying

my question is whats the best RAG platform right now if you want to go fully open source ?
also please try your rag with more complex pdf files, not easy text files like a story

saeednsp

Hey! Thanks for the lovely videos....
Just a question, how do I work without GPU? It's asking me for nvidia drive which is not installed in my system. And I do not intend to install it either... So any workaround you may suggest?

ashwinisivanandan

Hi, I have a question. I already implement rag with mistral dolphin 7B. I also test some advanced rag techniques like ensembles, parent-child, multi-query retrievals. I don't have a GPU so I run my llm on LM studio server. In your video you say we need GPU can we also use CPU and make the same interface and project? I also want to deploy my app on server. I looked into your video in which you are using flask to deploy the app but its locally what I have to do to deploy it on server. I also came across with Heroku platform to deploy applications with gpu. I am confused can I make the application like you did on my local system without gpu for test and than deploy it on server.

AliFarooq-ygfn

These videos are amazing! Can you release a version with the summary doc task added back….?? Subject to LLM model…thanks

musumo

are there any good open source embedding models? if someone wants to keep their data private, won't ada require you to send your data to openai?

stbrfex

Hi,
How can open-source tools and frameworks be utilized to evaluate the performance of a Retrieval-Augmented Generation (RAG) system that integrates Large Language Models (LLMs) like Google Gemma?

TooyAshy-

Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem)

Build a Large Language Model AI Chatbot using Retrieval Augmented Generation

Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem)

Building a RAG application using open-source models (Asking questions from a PDF using Llama2)

Chatbots with RAG: LangChain Full Walkthrough

Build your own RAG (retrieval augmented generation) AI Chatbot using Python | Simple walkthrough

Chatbot with RAG, using LangChain, OpenAI, and Groq

RAG + Langchain Python Project: Easy AI/Chat For Your Docs

FREE Local RAG Chatbot with Ollama - Streamlit and Langchain. Build with open-source Mistral ai

groq supercharges fast ai inference for meta llama 3.1 (open source gpt-4o)

Verba: This Easy-to-Install RAG Chatbot is better than ChatGPT (works w/ Ollama, OpenAI, Gemini)

I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

Building Production-Ready RAG Applications: Jerry Liu

ADVANCED Python AI Agent Tutorial - Using RAG

What is Retrieval-Augmented Generation (RAG)?

Custom AI Chatbot for Websites using any LLM | No-Code | Open-Source

Retrieval-Augmented Generation chatbot, part 1: LangChain, Hugging Face, FAISS, AWS

How to chat with your PDFs using local Large Language Models [Ollama RAG]

Stop paying for ChatGPT with these two tools | LMStudio x AnythingLLM

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

Opensource LLM| No API Key| RAG chatbot with Streamlit , FAISS, Langchain, LM Studio |Python code

Create SIMPLE Open Source RAG Chatbot😮 Step By Step

Vector Search RAG Tutorial – Combine Your Data with LLMs with Advanced Search

Investment Banker RAG Chatbot using Intel's Neural Chat LLM

Chat with your PDF Chatbot: All OPEN SOURCE (Runs on CPU)