Superfast RAG with Llama 3 and Groq

preview_player
Показать описание
Groq API provides access to Language Processing Units (LPUs) that enable incredibly fast LLM inference. The service offers several LLMs including Meta's Llama 3. In this video, we'll implement a RAG pipeline using Llama 3 70B via Groq, an open source e5 encoder, and the Pinecone vector database.

📌 Code:

🌲 Subscribe for Latest Articles and Videos:

👋🏼 AI Consulting:

👾 Discord:

#artificialintelligence #llama3 #groq

00:00 Groq and Llama 3 for RAG
00:37 Llama 3 in Python
04:25 Initializing e5 for Embeddings
05:56 Using Pinecone for RAG
07:24 Why We Concatenate Title and Content
10:15 Testing RAG Retrieval Performance
11:28 Initialize connection to Groq API
12:24 Generating RAG Answers with Llama 3 70B
14:37 Final Points on Why Groq Matters
Рекомендации по теме
Комментарии
Автор

hi James, Microsoft just open-sourced their graphRAG technology stack, might be cool to take a look and see how we can leverage/combine them both.

awakenwithoutcoffee
Автор

Nice walk through and I agree that Groq is amazing... Just wish they had other models.

alexjensen
Автор

Nice thing is that you can use groq with langchain as well

tiagoc
Автор

What are your thoughts on adding a short summary description of the document or paper in each chunk including the title?

gilbertb
Автор

You in Bali nice! I am looking for an online job mate. I'm pretty desperate at this point

content_ai_
Автор

Is there any oss embedding model you'd recommend over e5 for real/prod use cases? I've just used openai so far

tiagoc
Автор

is this re-usable in such way that we can switch calling groq to call open ai gpt-4o or other models?

Davorge