filmov
tv
Open Source RAG Chatbot with Gemma and Langchain | (Deploy LLM on-prem)
Показать описание
In this video, I show how to serve your open-source LLM and Embedding model on-prem for designing a Retrieval Augmented Generation (RAG) chatbot. For this purpose, I take RAG-GPT chatbot and instead of the GPT model, I use *Google Gemma 7B* as the LLM and instead of text-embedding-ada-002, I use *baai/bge-large-en* from Huggingface. I use Flask to develop a web server that will serve the LLM for real-time inferencing and I show you how to use Postman to develop these types of projects.
00:00 intro
01:22 Demo
03:12 RAG-GPT schema
04:39 RAG-Gemma schema
05:19 Challenges of open-source LLMa for chatbots
7:15 A possible solution for serving LLMs on-prem
08:02 Lost in the middle (A tip for context-length)
09:26 Project structure
11:18 How to load and interact with Gemma
13:06 Developing LLM web server with Flask
20:38 Testing and debugging the LLM web server with Postman
25:00 Testing the RAG chatbot
27:22 GPU usage of this chatbot
27:48 Do we need a web server for the embedding model?
🎓 *Models that are used in this chatbot:*
📚 *Extra Resources:*
#opensource #llm #rag #chatbot #huggingface #google #gemma #gradio #langchain #Flask #python #GUI #postman
00:00 intro
01:22 Demo
03:12 RAG-GPT schema
04:39 RAG-Gemma schema
05:19 Challenges of open-source LLMa for chatbots
7:15 A possible solution for serving LLMs on-prem
08:02 Lost in the middle (A tip for context-length)
09:26 Project structure
11:18 How to load and interact with Gemma
13:06 Developing LLM web server with Flask
20:38 Testing and debugging the LLM web server with Postman
25:00 Testing the RAG chatbot
27:22 GPU usage of this chatbot
27:48 Do we need a web server for the embedding model?
🎓 *Models that are used in this chatbot:*
📚 *Extra Resources:*
#opensource #llm #rag #chatbot #huggingface #google #gemma #gradio #langchain #Flask #python #GUI #postman
Комментарии