Better Llama 2 with Retrieval Augmented Generation (RAG)

Показать описание

Retrieval Augmented Generation (RAG) allows us to keep our Large Language Models (LLMs) up to date with the latest information, reduce hallucinations, and allow us to cite the original source of information being used by the LLM.

We build the RAG pipeline using a Pinecone vector database, a Llama 2 13B chat model, and wrap everything in Hugging Face and LangChain code.

📌 Code:

🌲 Subscribe for Latest Articles and Videos:

👋🏼 AI Consulting:

👾 Discord:

00:00 Retrieval Augmented Generation with Llama 2
00:29 Python Prerequisites and Llama 2 Access
01:39 Retrieval Augmented Generation 101
03:53 Creating Embeddings with Open Source
06:23 Building Pinecone Vector DB
08:38 Creating Embedding Dataset
11:45 Initializing Llama 2
14:38 Creating the RAG RetrievalQA Component
15:43 Comparing Llama 2 vs RAG Llama 2

#artificialintelligence #nlp #opensource #llama2

Рекомендации по теме

Комментарии

Thank you for the video. I am gradually transitioning from commercial models to open source. Your videos are very helpful.

micbab-vgmu

🎯 Key Takeaways for quick navigation:

00:13 📌 The aim is to quantize the 13 billion parameter LLM2 model and fit it into a single T4 GPU for free on Co-labs.
01:06 ⚙️ The model has to be adjusted to utilize GPU for hardware accelerator and T4 for GPU type.
01:34 💡 Explains the concept of retrieval augmented generation, giving your LM access to the 'outside world'. The plan is to work with a subset of the 'outside world' by searching with natural language.
02:52 🔄 The process will involve asking a question, obtaining relevant information about that question, and feeding that information back into the LM.
03:31 🎛️ Discusses the importance of the embedding model for translating human-readable text into machine-readable vectors.
04:56 🧠 Two documents are created and each is embedded using an embedding model.
06:19 📚 Discusses how to create a Vector database and build a Vector index using a free Pinecone API key.
07:25 ⌨️ Describes the process of initializing the index to store the vectors produced by the embedding model.
09:00 ✍️ Talks about populating the Vector database using a small dataset of chunks of text from the Llama2 paper and related documents.
11:54 🐐 Initiates the embedding process to include the LM (Llama 2) needed for the retrieval QA chain.
14:27 🗝️ Shows how to obtain the Hugging Face Authentication token needed to use Llama 2.
15:47 🥊 Compares the outcome of just asking an LM a question and using retrieval augmentation. The latter clearly provides much more relevant information.
19:00 🥇 Indicates that Llama 2 performs better on safety and usefulness benchmarks compared to local LMs like Chinchilla and Bard. This can be on par with closed-source models on certain tasks.

Made with Socialdraft

AntonioEvans

I am currently utilising Linda, my Linguistically Intelligent Networked Digital Assistant as a corporate governance tool with overwhelming success. This tiny aspect of what is now possible has the potential to change the world 🤯🧠

nuclear_AI

For a PoC stage, going through this with Pinecone Vector DB is good to get my head around the concept and putting the pieces together. However, I think most people what to move to a Local Vector DB if they are trying to provide a use case within a company as sensitive data should never be stored (as a policy) outside a the company domain. Anyway, Pinecone can have its usefulness however.

senju

The challenge with the RAG + LLM approach is that actual data sources are vast in most of the real use cases. When these enormous volumes of documents are splitted and vectorized, they generate an immense quantity of vectorized data (millions) in the vector database. Consequently, when trying to retrieve an answer to a specific question, the vectors extracted from the database often fail to closely or precisely align with what you're searching for. Literally like looking for a needle in a haystack...

adilgun

Thanks! Actually your brain diagram thing saved me some time explaining something I built to friends. Lol, though I was expecting you to fill out more parts of the brain. I like those definitions of the type of knowledge in that explanation as well. Good show.

natecodesai

Hi, You've saved the metadata but didn't use it at all. I would improve the code/video by adding SOURCES: to the response. That will also show what text did it use to provide the answer. That way you can prove that it returned just the relevant text + how well it summarized what it got. + Compare results and explain use cases when would you enlarge the number of K retrievals and when would you enlarge the size of chunks.

ilanser

Thanks for your tutorials! One suggestion if I may, can we not use the cut scene for transitions (like a corrupted clip). Its too much sensory feedback all of a sudden and distracts from what you say next. Thank you nonetheless for the content 👏

wayallen

Fantastic tutorial, you deserve 1 Mio. subscribers.

fabianaltendorfer

Fantastic tutorial, you deserve Subscribers.👍👍👍

robertgoldbornatyout

Thank you for the video.
Suggestions: Next in line can be QA Generation Evaluation using Llama 2. I have tried using open LLM evaluation and found hard to implement without using Openai.

navneetkrc

Thank you James. You videos are awesome. One odd thing I have noticed is that when I load the llama2 7b model in a colab pro account and choose one A100 GPU, the model alone has 26.5 GB GPU memory, while in your videos, the model only takes 8.2 GB. I used the exact quantization and model settings.

xifanwang

As usual this is so incredible. I tend to believe your comments as you don't get hyper about some features and take a more systematic approach in evaluating and try to state the facts as it is. Thanks!

paraconscious

This is awesome and will help me immensely. Im still going through the code so I can learn to implement a more elaborate sandbox with this. In theory, are we now able to get the llm to cite the document or even link to it as long as it's in the embedded data features? My org is scared of Generative AI due to the hallucinations and ethical issues of unknown training data. These types of tech explanations, especially in notebooks, help me explain the value of these open-source models and the techniques to use them with transparency. Having the LLM be able to cite sources and provide links straight to documents is critical to getting over all these legal and political concerns.

beyond

Just discovered you from this video, this is amazing thank you so much

BearMan-libe

Great content. Why you use a non-local Vector Data Base (pinecode) for a local LLM model?

javiergimenezmoya

Sorry but reading the code is much clearer than watching this video, because you basically just read out the comments in the code anyway. Time spent making the video would have been better spent making a diagram of how the different pieces connect. (This is meant to be constructive criticism, thanks for the code and sharing the knowledge)

tylertheeverlasting

It would interesting to see how to implement this with a completely private, local open source vector database like chroma.

scottmiller

Great content James. Your dataset seems interesting, do you have more information about how you created it? How much code would be needed to be
changed to run instead? Can that one fit on a T4?

TzaraDuchamp

Thank you, but listening for 2 minutes and still don’t know what RAG is and why I should use it

ebudmada

Better Llama 2 with Retrieval Augmented Generation (RAG)

Better Llama 2 with Retrieval Augmented Generation (RAG)

I used LLaMA 2 70B to rebuild GPT Banker...and its AMAZING (LLM RAG)

RetrievalQA with LLaMA 2 70b & Chroma DB

Fine Tune LLaMA 2 In FIVE MINUTES! - 'Perform 10x Better For My Use Case'

Why Llama 2 Is Better Than ChatGPT (Mostly...)

LLama 2 + PEFT Docs: CODE interactive LLM w/ RAG

How Did Llama-3 Beat Models x200 Its Size?

How to use Custom Prompts for RetrievalQA on LLaMA-2 7B

Build and Run a Medical Chatbot using Llama 2 on CPU Machine: All Open Source

Fine-tuning Llama 2 on Your Own Dataset | Train an LLM for Your Use Case with QLoRA on a Single GPU

Getting to Know Llama 2: Everything You Need to Start Building

Llama-2 with LocalGPT: Chat with YOUR Documents

Implementing RAG with Chroma and Llama 2 | Generative AI Series

LLama 2: Bester Open-Source Chatbot in GPT4All

RAG implementation using Llama-2 model

Local RAG with llama.cpp

Fully local RAG agents with Llama 3.1

LLAMA-2 🦙: EASIET WAY To FINE-TUNE ON YOUR DATA 🙌

MULTI MODAL 🧠 RetrieVal SysteM UsiNg LLAMA-INDEX 🦙

Chat with CSV Streamlit Chatbot using Llama 2: All Open Source

Best Model of LLama 2 | Live Performance Comparison

How To Use llama 2| llama 2 Chatbot

Llama 3.2 Deep Dive - Tiny LM & NEW VLM Unleashed By Meta

Retrieval Augmented Generation (RAG) with any LLM using Llama Index and Milvus