The Hidden Cost of Embeddings in RAG and how to Fix it

Показать описание

Embeddings are crucial for a production-ready RAG system but often get overlooked. I cover the costs, storage considerations, and ways to reduce storage requirements using techniques like dimensionality reduction and quantization. Learn how these methods can improve speed and save costs without compromising too much on performance.

LINKS:

💻 RAG Beyond Basics Course:

Let's Connect:

Signup for Newsletter, localgpt:

00:00 Introduction to Embeddings in RAG Systems
00:47 Understanding Embedding Costs
01:17 Storage Costs and Considerations
03:32 Reducing Storage Needs
03:41 Dimensionality Reduction Techniques
04:24 Matrosha Representation Learning
05:14 Precision Reduction Techniques
06:28 Quantization Study by Hugging Face
10:07 Implementing Quantization in Your Pipelines
12:56 Using Open Source Vector Stores
15:01 Conclusion and Final Thoughts

All Interesting Videos:

Рекомендации по теме

Комментарии

Use Embedding-3-small + Qdrant Quantization for saving storage costs.

greendsnow

Pretty good! very useful as I never thought about the long term wallet bleeding

BadBite

Thank you very much ! This is very good to know if our app is becoming bigger.

uwegenosdude

Brilliant and extremely useful and relevant information as usual. Thanks!

aibeginnertutorials

That’s great! Yes, please create a video with a useful example. I‘d appreciate it! 🎉🎉

MeinDeutschkurs

Very interesting and important points you raised. I’ve seen startups completely unaware of this and, as a result, they're doomed. Many don’t even use features like OpenAI’s dimension reduction. This binary and quantization has been around since March and is incredibly powerful. Now, with Gemini's support for PDF and long context windows, freeing up to a billion tokens in a day, it raises questions about when to use embedding and RAG, and when not to. When necessary, combining this with a long context window seems like the perfect solution. I suggest you create a video showing how to use this with Gemini to fetch and cache context, which will deliver the best balance of performance and cost.

unclecode

Thanks for your very useful information.

sashirestela

Thank you, waiting for real tutorial for production RAG app

messam

Just a question, I see you are mentioning e.g. the AWS X2gd EC2 instance. So if I understand correctly you want to keep all the vectors in memory. Isn't it better to just use a storage solution for this instead if the database is massive? E.g. Amazon OpenSearch Service. Storage should be cheap...

jirikosek

Yes, this is exactly what I'm looking for

hsin-yusu

Please make a video on hybrid search using the BM25 algorithm.

harshilpatel

Very helpful
One question, can you explain the difference between this word quantization used with embedding model (here) and use of quantization when doing inference or fine-tuning!?

abdulrehmanbaber

pinecone storage is about $0.334 per gb per month but also has other fees for read and write.

jayco

Using QDrant on our servers, RAM will be our largest expense to maintain the database as it grows.

mattshelley

The Hidden Cost of Embeddings in RAG and how to Fix it

The Hidden Cost of Embeddings in RAG and how to Fix it

The Hidden Cost of Using AI

Vectoring Words (Word Embeddings) - Computerphile

Whitepaper Companion Podcast - Embeddings & Vector Stores

Andrew Ng's Secret to Mastering Machine Learning - Part 1 #shorts

Can you afford gen AI?

what it’s like to work at GOOGLE…

A.I. Experiments: Visualizing High-Dimensional Space

13. Forcing buskers to get £2m coverage is just Westminster embedding a hidden cost into its licence...

Day in My Life as a Quantum Computing Engineer!

Home Inspector Finds a VERY WELL BUILT Deck

What are Autoencoders?

OpenAI Embeddings (and Controversy?!)

MIT Scientist on Unifying Cognition and Biology | Manolis Kellis

Autoencoders - EXPLAINED

Nesting 'If Statements' Is Bad. Do This Instead.

Tackling the Hidden Costs of Computational Science

The Hidden Price of Climate Change: Crash Course Climate & Energy #11

How ChatGPT Works Technically | ChatGPT Architecture

BERT vs GPT

Activation Functions In Neural Networks Explained | Deep Learning Tutorial

Power BI Licensing explained

Embedding with Power BI - What's the difference?

Realistic Perspective Text - Photoshop Tutorial