How to evaluate an LLM-powered RAG application automatically.

Показать описание

Source code of this example:

I teach a live, interactive program that'll help you build production-ready machine learning systems from the ground up. Check it out here:

To keep up with the content I create:

Рекомендации по теме

Комментарии

Hey Santiago! Just wanted to drop a comment to say that you're absolutely killing it as an instructor. Your way of breaking down the code and the whole process into simple, understandable language is pure gold, making it accessible for new comers like me. Wishing you all the success and hoping you keep blessing the community with your valuable content!
Aside of the teaching side have you tried to create micro-saas based on this technologies ? For me seems that you are half there and could be great opportunity to expand your business.

aleksandarboshevski

THANK YOU
I greatly appreciate the release of the new videos. The clarity of the explanations and the logical sequence of the content are exceptional.

TooyAshy-

This is my first time watching your videos. It is great. Thank you.

mohammedsuliman

Thanks ! Exactly what I was looking for. I’ve been cracking my head on how the hell to test a RAG system. How the hell is business going to give me 1000+ questions to test and how can a human verify the response. Top content.

AmbrishYadav

Oh man, the way you explained these complex topics is mind blowing. I just wanted to say thank you for making such types of videos.

dikshantgupta

FYI, keep an eye on the mic volume levels! Sounds like it was clipping

TheScott

Hello Santiago, Your explanation was thorough and I understood it really well, Now I have a question as is there any other tool than giskard to evaluate (which is open source and does not require openai api key) for my llm or rag model.
Thank you in advance😊

hhdofno

Love the video. Great breakdown. Would like to see more detail in evaluation results (e.g. it is now .73 good. WTH...!?), how tweaking the pipeline gives different eval results, and e.g. Ragas versus Giskard.

peterhjvaneijk

We appreciate your work a lot, my man.

TPH

Glad to see you involved pytest in the end, it is like a surprise dessert🍰 after great meal.

liuyan

Superb video. Great content from start to finish. Thank you.

tee_iam

Damm, you explained each step really well! Love it!

maxnietzsche

Great stuff!

What are your preferred open source alternatives to all tools used in this tutorial?

horyekhunley

Super important topic you covered here man!

alextiger

Great stuff Santiago! You've used giskard to create the test cases. These test cases themselves are created using an LLM. In a real application, would we have to manually vet the test cases to ensure they themselves are 100% accurate?

CliveFernandesNZ

Just awesome instruction Santiago. I am a beginner but you make learning digestible and clear! Sorry if ignorant question. but is it possible to use FAISS, Postgres, MongoDB, or Chroma DB, or another free open source model that can be substituted for pinecone to save money, and if so which would you recommend for ease of implementation with Langchain?

theacesystem

Amazing! Can you also explain how to do the same type of evaluation on Vision Language Models that use images?

MohammadEskandari-doxy

How can i use huggingface llms to generate the testset?

francescofisica

So the one thing you learn training ML models is that you don’t evaluate your model on training data, and be careful of data leaking. Here, you’re providing giskard your embedded documentation, meaning giskard is likely using its own RAG system to generate tests cases, which you then use to evaluate your own RAG system. Can you please explain how this isn’t nonsense? Do you evaluate the accuracy of the giskard test cases beyond the superficial “looks good to me” method that you claim to be replacing? What metrics do you evaluate giskard’s test cases against since its answers are also subjective, you’re just now entrusting that subjective evaluation to another LLM?

maxisqt

do we need to have a paid subscription to openai apis to be able to use giskard?

dhrroovv

How to evaluate an LLM-powered RAG application automatically.

How to evaluate an LLM-powered RAG application automatically.

Evaluating LLM-based Applications

LangSmith Tutorial - LLM Evaluation for Beginners

Evaluate LLMs - RAG

How to Build, Evaluate, and Iterate on LLM Agents

LLM Explained | What is LLM

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

Deep Dive into LLM Evaluation with Weights & Biases

New Research on Natural Language Prompting with AtScale’s Semantic Layer and Generative AI

Evaluating LLMs using Langchain

Evaluating LLM-based Applications // Josh Tobin // LLMs in Prod Conference Part 2

LLM Evaluation With MLFLOW And Dagshub For Generative AI Application

Ep 6. Conquer LLM Hallucinations with an Evaluation Framework

Advanced LLM Evaluation Techniques: Chapter 22

Why Large Language Models Hallucinate

Read TWO papers: How to evaluate LLM performance

How to Evaluate LLM Performance for Domain-Specific Use Cases

How to test language models with LLM Bench

Walert - Evaluating a LLM Powered Chatbot : Research Paper

LLM Evaluation with Mistral 7B for Evaluating your Finetuned models

Evaluating LLM

LLM Module 4: Fine-tuning and Evaluating LLMs | 4.11 Guest Lecture Harrison Chase, LangChain Creator

LLMOps (LLM Bootcamp)

Prompt Engineering and LLM App Evaluation with Agenta