The Best RAG Technique Yet? Anthropic’s Contextual Retrieval Explained!

preview_player
Показать описание
Anthropic has launched a new retrieval mechanism called contextual retrieval, which combines chunking strategies with re-ranking to significantly improve performance. In this video, I explain how this technique enhances retrieval accuracy, including practical implementation steps and benchmark results. Learn how to optimize your RAG systems by adding contextual embeddings, keyword-based BM25 indexing, and re-ranking to achieve state-of-the-art results.

LINKS:

💻 RAG Beyond Basics Course:

Let's Connect:

Signup for Newsletter, localgpt:

00:00 Introduction to Contextual Retrieval
00:20 Understanding RAG Systems
00:55 Combining Semantic and Keyword Search
01:44 Challenges with Standard RAG Systems
02:48 Anthropic's Contextual Retrieval Approach
03:37 Implementing Contextual Retrieval
07:06 Performance Improvements and Benchmarks
09:02 Best Practices for RAG Systems
12:48 Code Example and Practical Implementation
15:21 Conclusion and Final Thoughts

All Interesting Videos:

Рекомендации по теме
Комментарии
Автор

For anyone wondering, I did try these methods (contextual retrieval + reranking) with a local model on my laptop. It does work great the rag part but it takes a while to import new documents due to chunking, generating summaries and generating embeddings. Re-ranking on a local model is surprisingly fast and really good with the right model. If you're building an application using rag, I'd suggest you make adding docs the very first step in the on-boarding to your application because you can then do all of the chunking etc in the background. The user might be expecting real-time drag->drop->ask question workflow but it wont work like that unless you're using models in the cloud. Also, remember to chunk, summarize and gen embeddings simultaneously, not one chunk after another as of course that'll take longer for your end-user.

tvwithtiffani
Автор

Sending my best to the little one in the background!

BinWang-bf
Автор

Thanks very interesting. Many ideas came to my head for improving RAG with enhancing chunk

tomwawer
Автор

Applying this to local models for large document repos seems like a good combo to increase RAG performance. I wonder how you would optimize for the local environment.

IAMCFarm
Автор

Working with this now and didn’t use the new caching method 😫. Nice to have someone else run through this 🎉😆

seanwood
Автор

do you maybe know what is going on in GPT Assistants - cause they rag is really efficiant - accurate - they have default 800 token chunks and 400 overlap. And it seems to work really well.Perhaps they use somekind of re-ranker also? Maybe you know ..

MatichekYoutube
Автор

Thought it was my baby, but it was yours in the background 😂😂

anubisai
Автор

Best wishes for the kid in the background

megamehdi
Автор

Thanks for the easy to understand explanation

vikramn
Автор

what if the document is so big, that it couldn't fit in the llm context window how do we get the contextual based chunks then.
if we consider break the document into small segments/documents to implement this approach, won't it lose some context with it

SunilM-xo
Автор

Hasn't structured graphRAG already solved this? Find the structured data using a graph, then navigate it to pull the exact example?

ic_jason
Автор

I was looking all over in the house and outside for the source of that sound. I thought the neighborhood cats were having a conference on my porch!

aaronjsolomon
Автор

How is the diagram generated/built at 0:48 for RAG embeddings?

PeterJung-cxib
Автор

What happened if the document contains a lot of images like tables, charts, and so on? Can we still chunk the document in a normal way like setting a chunk size?

limjuroy
Автор

I think the baby in the background disagrees :p

jackbauer
Автор

How to generate those context for chunks without having the sufficient information to the LLM regarding the chunk? How they are getting the information about the revenue numbers in that example? If it is extracted from the whole document then it will be painful for llm cost.

souvickdas
Автор

I want to add this as a the default way the rag is handled in open webUI but its conflicting with other stuff, I tried to make a custom pipeline for it but i'm struggling to make it work is it out of the scope of open web UI or am I just not understanding the documentation properly

DRMEDAHMED
Автор

Can someone suggest something for larger documents (above 500 pages)? Normal RAG is not so accurate and cannot use contextual Rag as we need to pass the whole document inside the prompt which exceeds the token limit.

HarmanSingh-wpin
Автор

What tool did you use to record the video?

LatifAmars
Автор

Losing the context in RAG is a real issue that can destyall usefulness.
I have read that a combination of the chunks and Graphs is a way to overcome that.
But have not tested with a use case yet myself.

konstantinlozev