The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Показать описание

Anthropic has launched a new retrieval mechanism called contextual retrieval, which combines chunking strategies with re-ranking to significantly improve performance. In this video, I explain how this technique enhances retrieval accuracy, including practical implementation steps and benchmark results. Learn how to optimize your RAG systems by adding contextual embeddings, keyword-based BM25 indexing, and re-ranking to achieve state-of-the-art results.

LINKS:

💻 RAG Beyond Basics Course:

Let's Connect:

Signup for Newsletter, localgpt:

00:00 Introduction to Contextual Retrieval
00:20 Understanding RAG Systems
00:55 Combining Semantic and Keyword Search
01:44 Challenges with Standard RAG Systems
02:48 Anthropic's Contextual Retrieval Approach
03:37 Implementing Contextual Retrieval
07:06 Performance Improvements and Benchmarks
09:02 Best Practices for RAG Systems
12:48 Code Example and Practical Implementation
15:21 Conclusion and Final Thoughts

All Interesting Videos:

Рекомендации по теме

Комментарии

For anyone wondering, I did try these methods (contextual retrieval + reranking) with a local model on my laptop. It does work great the rag part but it takes a while to import new documents due to chunking, generating summaries and generating embeddings. Re-ranking on a local model is surprisingly fast and really good with the right model. If you're building an application using rag, I'd suggest you make adding docs the very first step in the on-boarding to your application because you can then do all of the chunking etc in the background. The user might be expecting real-time drag->drop->ask question workflow but it wont work like that unless you're using models in the cloud. Also, remember to chunk, summarize and gen embeddings simultaneously, not one chunk after another as of course that'll take longer for your end-user.

tvwithtiffani

Sending my best to the little one in the background!

BinWang-bf

Thanks very interesting. Many ideas came to my head for improving RAG with enhancing chunk

tomwawer

Applying this to local models for large document repos seems like a good combo to increase RAG performance. I wonder how you would optimize for the local environment.

IAMCFarm

Working with this now and didn’t use the new caching method 😫. Nice to have someone else run through this 🎉😆

seanwood

do you maybe know what is going on in GPT Assistants - cause they rag is really efficiant - accurate - they have default 800 token chunks and 400 overlap. And it seems to work really well.Perhaps they use somekind of re-ranker also? Maybe you know ..

MatichekYoutube

Thought it was my baby, but it was yours in the background 😂😂

anubisai

Best wishes for the kid in the background

megamehdi

Thanks for the easy to understand explanation

vikramn

what if the document is so big, that it couldn't fit in the llm context window how do we get the contextual based chunks then.
if we consider break the document into small segments/documents to implement this approach, won't it lose some context with it

SunilM-xo

Hasn't structured graphRAG already solved this? Find the structured data using a graph, then navigate it to pull the exact example?

ic_jason

I was looking all over in the house and outside for the source of that sound. I thought the neighborhood cats were having a conference on my porch!

aaronjsolomon

How is the diagram generated/built at 0:48 for RAG embeddings?

PeterJung-cxib

What happened if the document contains a lot of images like tables, charts, and so on? Can we still chunk the document in a normal way like setting a chunk size?

limjuroy

I think the baby in the background disagrees :p

jackbauer

How to generate those context for chunks without having the sufficient information to the LLM regarding the chunk? How they are getting the information about the revenue numbers in that example? If it is extracted from the whole document then it will be painful for llm cost.

souvickdas

I want to add this as a the default way the rag is handled in open webUI but its conflicting with other stuff, I tried to make a custom pipeline for it but i'm struggling to make it work is it out of the scope of open web UI or am I just not understanding the documentation properly

DRMEDAHMED

Can someone suggest something for larger documents (above 500 pages)? Normal RAG is not so accurate and cannot use contextual Rag as we need to pass the whole document inside the prompt which exceeds the token limit.

HarmanSingh-wpin

What tool did you use to record the video?

LatifAmars

Losing the context in RAG is a real issue that can destyall usefulness.
I have read that a combination of the chunks and Graphs is a way to overcome that.
But have not tested with a use case yet myself.

konstantinlozev

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

Advanced RAG techniques for developers

How RAG Turns AI Chatbots Into Something Practical

LangChain - Advanced RAG Techniques for better Retrieval Performance

Advanced Chucking Strategy for RAG #llms #ai

Building Production-Ready RAG Applications: Jerry Liu

RAG vs. Fine Tuning

What is Retrieval-Augmented Generation (RAG)?

Accelerating Multilingual RAG Systems

How Does Rag Work? - Vector Database and LLMs #datascience #naturallanguageprocessing #llm #gpt

Advanced RAG Techniques

Advanced RAG 03 - Hybrid Search BM25 & Ensembles

RAG++ course: Reciprocal rank fusion

Setting up Retrieval Augmented Generation (RAG) in 3 Steps

Techniques for Optimizing Search in RAG Systems

Chunking Strategies in RAG: Optimising Data for Advanced AI Responses

RAG Explained

Semantic Chunking - 3 Methods for Better RAG

Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer

😲 Building Advanced RAG systems #ai

Prompt Engineering, RAG, and Fine-tuning: Benefits and When to Use

How Contextual Retrieval Elevates Your RAG to the Next Level

Intro to RAG for AI (Retrieval Augmented Generation)

What is The Best RAG Chunking Method?!