Better Llama 2 with Retrieval Augmented Generation (RAG)

preview_player
Показать описание
Retrieval Augmented Generation (RAG) allows us to keep our Large Language Models (LLMs) up to date with the latest information, reduce hallucinations, and allow us to cite the original source of information being used by the LLM.

We build the RAG pipeline using a Pinecone vector database, a Llama 2 13B chat model, and wrap everything in Hugging Face and LangChain code.

📌 Code:

🌲 Subscribe for Latest Articles and Videos:

👋🏼 AI Consulting:

👾 Discord:

00:00 Retrieval Augmented Generation with Llama 2
00:29 Python Prerequisites and Llama 2 Access
01:39 Retrieval Augmented Generation 101
03:53 Creating Embeddings with Open Source
06:23 Building Pinecone Vector DB
08:38 Creating Embedding Dataset
11:45 Initializing Llama 2
14:38 Creating the RAG RetrievalQA Component
15:43 Comparing Llama 2 vs RAG Llama 2

#artificialintelligence #nlp #opensource #llama2
Рекомендации по теме
Комментарии
Автор

Thank you for the video. I am gradually transitioning from commercial models to open source. Your videos are very helpful.

micbab-vgmu
Автор

🎯 Key Takeaways for quick navigation:

00:13 📌 The aim is to quantize the 13 billion parameter LLM2 model and fit it into a single T4 GPU for free on Co-labs.
01:06 ⚙️ The model has to be adjusted to utilize GPU for hardware accelerator and T4 for GPU type.
01:34 💡 Explains the concept of retrieval augmented generation, giving your LM access to the 'outside world'. The plan is to work with a subset of the 'outside world' by searching with natural language.
02:52 🔄 The process will involve asking a question, obtaining relevant information about that question, and feeding that information back into the LM.
03:31 🎛️ Discusses the importance of the embedding model for translating human-readable text into machine-readable vectors.
04:56 🧠 Two documents are created and each is embedded using an embedding model.
06:19 📚 Discusses how to create a Vector database and build a Vector index using a free Pinecone API key.
07:25 ⌨️ Describes the process of initializing the index to store the vectors produced by the embedding model.
09:00 ✍️ Talks about populating the Vector database using a small dataset of chunks of text from the Llama2 paper and related documents.
11:54 🐐 Initiates the embedding process to include the LM (Llama 2) needed for the retrieval QA chain.
14:27 🗝️ Shows how to obtain the Hugging Face Authentication token needed to use Llama 2.
15:47 🥊 Compares the outcome of just asking an LM a question and using retrieval augmentation. The latter clearly provides much more relevant information.
19:00 🥇 Indicates that Llama 2 performs better on safety and usefulness benchmarks compared to local LMs like Chinchilla and Bard. This can be on par with closed-source models on certain tasks.

Made with Socialdraft

AntonioEvans
Автор

I am currently utilising Linda, my Linguistically Intelligent Networked Digital Assistant as a corporate governance tool with overwhelming success. This tiny aspect of what is now possible has the potential to change the world 🤯🧠

nuclear_AI
Автор

For a PoC stage, going through this with Pinecone Vector DB is good to get my head around the concept and putting the pieces together. However, I think most people what to move to a Local Vector DB if they are trying to provide a use case within a company as sensitive data should never be stored (as a policy) outside a the company domain. Anyway, Pinecone can have its usefulness however.

senju
Автор

The challenge with the RAG + LLM approach is that actual data sources are vast in most of the real use cases. When these enormous volumes of documents are splitted and vectorized, they generate an immense quantity of vectorized data (millions) in the vector database. Consequently, when trying to retrieve an answer to a specific question, the vectors extracted from the database often fail to closely or precisely align with what you're searching for. Literally like looking for a needle in a haystack...

adilgun
Автор

Thanks! Actually your brain diagram thing saved me some time explaining something I built to friends. Lol, though I was expecting you to fill out more parts of the brain. I like those definitions of the type of knowledge in that explanation as well. Good show.

natecodesai
Автор

Hi, You've saved the metadata but didn't use it at all. I would improve the code/video by adding SOURCES: to the response. That will also show what text did it use to provide the answer. That way you can prove that it returned just the relevant text + how well it summarized what it got. + Compare results and explain use cases when would you enlarge the number of K retrievals and when would you enlarge the size of chunks.

ilanser
Автор

Thanks for your tutorials! One suggestion if I may, can we not use the cut scene for transitions (like a corrupted clip). Its too much sensory feedback all of a sudden and distracts from what you say next. Thank you nonetheless for the content 👏

wayallen
Автор

Fantastic tutorial, you deserve 1 Mio. subscribers.

fabianaltendorfer
Автор

Fantastic tutorial, you deserve Subscribers.👍👍👍

robertgoldbornatyout
Автор

Thank you for the video.
Suggestions: Next in line can be QA Generation Evaluation using Llama 2. I have tried using open LLM evaluation and found hard to implement without using Openai.

navneetkrc
Автор

Thank you James. You videos are awesome. One odd thing I have noticed is that when I load the llama2 7b model in a colab pro account and choose one A100 GPU, the model alone has 26.5 GB GPU memory, while in your videos, the model only takes 8.2 GB. I used the exact quantization and model settings.

xifanwang
Автор

As usual this is so incredible. I tend to believe your comments as you don't get hyper about some features and take a more systematic approach in evaluating and try to state the facts as it is. Thanks!

paraconscious
Автор

This is awesome and will help me immensely. Im still going through the code so I can learn to implement a more elaborate sandbox with this. In theory, are we now able to get the llm to cite the document or even link to it as long as it's in the embedded data features? My org is scared of Generative AI due to the hallucinations and ethical issues of unknown training data. These types of tech explanations, especially in notebooks, help me explain the value of these open-source models and the techniques to use them with transparency. Having the LLM be able to cite sources and provide links straight to documents is critical to getting over all these legal and political concerns.

beyond
Автор

Just discovered you from this video, this is amazing thank you so much

BearMan-libe
Автор

Great content. Why you use a non-local Vector Data Base (pinecode) for a local LLM model?

javiergimenezmoya
Автор

Sorry but reading the code is much clearer than watching this video, because you basically just read out the comments in the code anyway. Time spent making the video would have been better spent making a diagram of how the different pieces connect. (This is meant to be constructive criticism, thanks for the code and sharing the knowledge)

tylertheeverlasting
Автор

It would interesting to see how to implement this with a completely private, local open source vector database like chroma.

scottmiller
Автор

Great content James. Your dataset seems interesting, do you have more information about how you created it? How much code would be needed to be
changed to run instead? Can that one fit on a T4?

TzaraDuchamp
Автор

Thank you, but listening for 2 minutes and still don’t know what RAG is and why I should use it

ebudmada