Building Production-Ready RAG Applications: Jerry Liu

Показать описание

Large Language Models (LLM's) are starting to revolutionize how users can search for, interact with, and generate new content. Some recent stacks and toolkits around Retrieval Augmented Generation (RAG) have emerged where users are building applications such as chatbots using LLMs on their own private data. This opens the door to a vast array of applications. However while setting up a naive RAG stack is easy, productionizing it is hard. In this talk, we talk about core techniques for evaluating and improving your retrieval systems for better performing RAG.

About Jerry Liu
Jerry Liu, the co-founder and CEO of LlamaIndex, brings a wealth of expertise to his role, with a career that spans the realms of ML engineering, AI research, and startups. Prior to his current position, he served as an ML engineer at Quora and engaged in AI research with Uber's ATG. A Princeton alumnus, Jerry's professional journey has been enriched by various publications, including his most recent works: Deep Structured Reactive Planning and MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models, reflecting his commitment to the field.

AI Engineer

Рекомендации по теме

Комментарии

So far the most completed and clear LLM RAG go-through video ever existed on Youtube.

joxa

00:00:49 Fix the model by creating a data pipeline to add context into the prompt.
00:01:33 Understand the paradigms of retrieval augmentation and fine-tuning for language models.
00:02:00 Learn about building a QA system using data ingestion and querying components.
00:02:07 Explore lower-level components to understand data ingestion and querying processes.
00:03:01 Address challenges with naive rag applications, such as poor response quality.
00:04:02 Improve retrieval performance by optimizing data storage and pipeline.
00:04:14 Enhance the embedding representation for better performance.
00:04:45 Implement advanced retrieval methods like reranking and recursive retrieval.
00:05:18 Incorporate metadata filtering to add structured context to text chunks.
00:06:27 Experiment with small to big retrieval for more precise retrieval results.
00:07:14 Consider embedding references to parent chunks for improved retrieval.
00:09:31 Explore the use of agents for reasoning and more advanced analysis.
00:12:12 Fine-tune the rag system to optimize specific components for better performance.
00:17:01 Generate a synthetic query dataset from raw text chunks using LLMS to fine-tune and embed a model.
00:17:12 Fine-tune the base model itself or fine-tune an adapter on top of the model to improve performance.
00:17:16 Consider fine-tuning an adapter on top of the model as it has advantages such as not requiring the base model's weights to fine-tune and avoiding the need to reindex the entire document corpus when fine-tuning the query.
00:18:00 Explore the idea of generating a synthetic dataset using a bigger model like GBD4 and distilling it into a weaker LM like 3.5 Turbo to enhance train of thought, response quality, and structured outputs.

ReflectionOcean

So far this is the best presentation on RAG I have ever come across in last couple of months.

venkat

🎯 Key Takeaways for quick navigation:

01:44 🧩 *The current RAG stack for building a QA system consists of two main components: data ingestion and data querying (retrieval and synthesis).*
03:08 🚧 *Challenges with naive RAG include issues with response quality, bad retrieval, low precision, hallucination, fluff in return responses, low recall, and outdated information.*
04:31 🔄 *Strategies to improve RAG performance involve optimizing various aspects, including data, retrieval algorithm, and synthesis. Techniques include storing additional information, optimizing data pipeline, adjusting chunk sizes, and optimizing embedding representation.*
06:50 📊 *Evaluation of RAG systems involves assessing both retrieval and synthesis. Retrieval evaluation includes ensuring returned content is relevant to the query, while synthesis evaluation examines the quality of the final response.*
08:30 🛠️ *To optimize RAG systems, start with "table stakes" techniques like tuning chunk sizes, better pruning, adjusting chunk sizes, and using metadata filters integrated with vector databases.*
12:29 🧐 *Advanced retrieval methods, such as small to big retrieval and embedding a reference to the parent trunk, can enhance precision by retrieving more granular information.*
14:42 🧠 *Exploring more advanced concepts, like multi-document agents, allows for reasoning beyond synthesis, enabling the modeling of documents as sets of tools for tasks such as summarization and QA.*
16:23 🎯 *Fine-tuning in RAG systems is crucial to optimize specific components, such as embeddings, for better performance. It involves generating synthetic query datasets and fine-tuning on either the base model or an adapter on top of the model.*
18:15 📚 *Documentation on production RAG and fine-tuning, including distilling knowledge from larger models to weaker ones, is available for further exploration.*

Made with HARPA AI

kashishmukheja

Thank you not just for putting this together, but by making sense of it all! In 18min!? Amazing!

streetchronicles

I was thoroughly impressed by the depth of your insights and the clarity of your delivery. The ability of Jerry Liu to distill complex concepts into understandable terms was remarkable, and I particularly enjoyed how you illustrated the practical applications of RAG in various fields.

Would it be possible for you to share the slides from the Jerry Liu's presentation?

UncleDao

Very deep talking! Really appreciate and learned a lot

minwang

Your distilled video has almost no knowledge loss over hours of coursework. Great work !

Bball

i thoroughly enjoyed your presentation. jerry Liu-Thanks for the Deep methods to be applied to traditional RAG.-

gopikrishna

Really nice presentation skills, Jerry!

Ke_Mis

Thank you very much for this. In this age of LLms it is getting more and more important to be able to mesure theyr accuracy and efficacy. I've been working with problems like this since the beggining of 2024 and it's been such an interesting topic to learn about.
Cheers and thx for the upload

MatBat__

Very nice presentation and very practical tips for enterprise RAGs

bhaskartripathi

Thank you for this excellent presentation, very much appreciated

anne-marieroy

I love Jerry's approach to identifying intuition and solution

justy

short and sweet presentation. Very clear

jasonzhang

Thanks for Your hard-work. Really learned a lot

RealUniquee

Are there any take-aways here that can help an average user generate better results using a standard UI?

shopbc

I use the hyper-naive approach: Provide the LLM with all the knowledge keys in my MySQL DB and let it tell me which ones are most likely to be helpful for answering the current prompt. Then just load the entries based on the keys the LLM told me and inject them into the second propmpt, which the LLM is then supposed to answer. (Yes, Vector search would be way more fitting for this, but I'm a peasant and don't even have the slightest clue of how to to implement it)

holonaut

🎯 Key Takeaways for quick navigation:

00:01 🎤 *视频简介*
-
00:23 📚 *LLM的应用场景*
-
01:03 🔍 *LLM数据理解的两种主要方法*
- 检索增强：通过数据源将上下文添加到语言模型的输入提示中。
- 微调：通过训练模型权重来将知识嵌入到模型中。
01:44 📊 *RAG的构建*
- RAG架构包括数据摄取和数据查询，包括检索和合成。
-
03:08 🚧 *RAG的挑战*
-
-
05:27 🧪 *评估RAG系统*
- 讨论了RAG系统的评估方法，包括检索评估和合成评估。
- 强调了需要定义基准来度量性能的重要性。
08:30 🧩 *优化RAG系统*
-
16:23 🔄 *微调和未来展望*
-

Made with HARPA AI

chendeheng

RAG is an interesting idea. If the predictions are right and these models are only going to get better, wouldn’t it make sense to give them direct access to the embedding DB and let the model decide how best to handle retrieval rather than having the humans do it?

RyanStuart

Building Production-Ready RAG Applications: Jerry Liu

Building Production-Ready RAG Applications: Jerry Liu

Jerry Liu–LlamaIndex – Practical Data Considerations for building Production-Ready LLM Applications...

Practical Data Considerations For Building Production-Ready LLM Applications, Jerry Liu, LlamaIndex

Practical Tips for Building Production-Grade RAG Applications with LlamaIndex

LlamaIndex Webinar: Make RAG Production-Ready

What is Retrieval-Augmented Generation (RAG)?

Developing and Serving RAG-Based LLM Applications in Production

'I want Llama3 to perform 10x with my private knowledge' - Local Agentic RAG w/ llama3

A Survey of Production RAG Pain Points and Solutions // Jerry Liu // AI in Production Conference

Practical Data Considerations for Building Production-Ready LLM Applications

LLM Avalanche: Jerry Liu · Data Considerations for building more Production-ready LLM Applications

How To Build AI Knowledge Assistants with LlamaIndex

High-performance RAG with LlamaIndex

LlamaIndex Webinar: Building LLM Apps for Production, Part 1 (co-hosted with Anyscale)

What is Retrieval Augmented Generation (RAG) - Augmenting LLMs with a memory

Building a Generative Ai App on Private Enterprise Data With Retrieval Augmented Generation (Rag)

Building Production RAG Over Complex Documents

LlamaIndex and More: Building LLM Tech with Jerry Liu - Season 3, Episode #1 (January 2024)

Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers

LlamaIndex: Building RAG Applications with Multiple Data Sources

The One and Only WD40 Trick Everyone Should Know and 25 Other Uses

We Now Understand Why Frank Is No Longer On American Pickers

Build Agents from Scratch (Building Advanced RAG, Part 3)

LlamaIndex - Jerry Liu | Founders You Should Know