Mastering Retrieval for LLMs - BM25, Fine-tuned Embeddings, and Re-Rankers

preview_player
Показать описание

VIDEO RESOURCES:

TIMESTAMPS:
0:00 Mastering Retrieval (RAG) for LLMs
0:44 Video Overview
13:19 Baseline Performance with No Retrieval
17:29 Document Chunking - Naive vs Sentence based
24:34 BM25
33:20 Semantic / Vector / Embeddings Search
39:59 Cosine vs Dot Product Similarity
43:21 Generating Chunks and Embeddings
50:50 Running BM25 and Similarity Retrieval
55:22 Performance with BM25 vs Similarity
58:36 Fine-tuning embeddings / encoders
1:04:00 Preparing fine-tuning datasets
1:14:54 Embeddings Training Continued
1:22:00 Performance after Fine-tuning
1:25:58 Re-rankers
1:27:10: Cross-encoders
1:30:47 LLM re-rankers
1:36:11 Re-ranking performance
1:48:50 Final Tips
Рекомендации по теме
Комментарии
Автор

Thank you trellis!!! Awesome video as always, probably one of the best technical channels right now. Best

sergialbert
Автор

Amazing, this video is a treasure! Thanks a lot for explaining in depth. Very great job!

MortaAriyano
Автор

Amazing video, really the best explaination for the RAG pipeline I saw on YT. Great job!

TheMariolino
Автор

You are the best. Cant wait to try this out over the weekend!

KopikoArepo
Автор

Hey trelis you may have about 10k subs only but I really do appreciate all your videos. I personally learn and benefit a lot from them and I always recommend a friend of mine your videos for any detailed explanation required. I do have some questions from this video which is probably my favourite so far and I'm trying to understand in every possible way.

1) How can we know if a model was trained using dot product or cosine?
2) Can you please explain if the dot product when trying to calculate the cosine similarity is the same as the dot product you were comparing with before. Also, could you give an example about normalizing and could they standardize instead of normalize? I've always been confused about those terms and I'm not sure if they are related in this case. "In terms of computation power, its quicker to do dot products cuz in cosine ur finding the angle between 2 vectors which u'd first do dot product then normalize"
2) Regarding the Retrieval performance, is there any reason why you picked top 12 chunks? Also, does that mean if I tried top 20 chunks I can achieve near 100% accuracy?

seththunder
Автор

Loved the depth you went into, really enjoyed the video! Quick one, if you were to apply rag to a dataframe, how would you go about it? Converting to strings, then embedding each row as a chunk feels clunky but maybe it's the way to go? I guess with the context lengths available at this stage we could almost just convert entire dfs to strings and feed them in.

SeánCarmody-yp
Автор

Most interesting ! Thank you for sharing this video with us. I would be most interested if you could try something like LLMLingua to compress the context. Actually, I was wondering about using that on the chuncks to make them more efficient. Also, to have a response that could be checked against the knowledge source of the RAG, I'd be interested in LLM that can give citations of the relevant source chunks (assigning ids when chunking, before any compression). Do you have any experience on that ? How hard would it be to fine tune a model for RAG with citations ? Thx !

testcomptetest
Автор

Have you tried your pipeline on a different dataset for the test data? Maybe something like basketball rules instead.

cheeyuanng
Автор

Thanks for the great video! How does this solution scale? I can see the benefit of finetuning the embeddings for smaller data corpora, but does it do as well for large data corpora that have thousands of documents for different domains of knowledge, and does the finetuning still benefit if more documents are added at a later time that is in a different domain of knowledge?

unshadowlabs
Автор

Hi. Nice video, but I didn't get how to prepare a dataset. How to get a comprehensive list of questions and answers about my document?

VerdonTrigance
Автор

hey bro have you experimented with graphRAG ? appreciate the video. Learning every day about RAG..

awakenwithoutcoffee
Автор

thanks for this!
What changes would you implement if there are a large number (50+) of pdfs (100+ pages with embedded images and text)?

jeevanable
Автор

can you discuss GraphRag recently released by microsoft?

wryltxw
Автор

- if reranker is only good for similarity why not applying it only on it and after add bm 25 results ?
- why not finetune also the reranker ?

loicbaconnier
Автор

Any advantage in using BM25 instead of skelarn's TF-IDF?

Bragheto
Автор

trelis i have one problem. I am working on NL to SQL problem. i have written column descriptions of each column for each table in my database and then i comverted those descriptions into embeddings and stored them. now when user questions come, i convert that question into embeddings and then i multiply this with embedding of each column description we have created earlier. then i select top 20 columns based on cosine similarity score. but the thing is i mostly miss one or two columns doing this. questions is of one line, menas it dosen't include many details to get relevant columns and sometimes irrelevant columns gives higher cosine score and i miss relevant ones. do you have any idea how can i approach this problem? the only solution i see is increasing the number of columns i am selecting but it increases the prompt size that i give to LLM in the input. and you know there is limited context window for LLMs.

TemporaryForstudy