Evaluate LLM Systems & RAGs: Choose the Best LLM Using Automatic Metrics on Your Dataset

Показать описание

Learn how to effectively evaluate new Large Language Models (LLMs) using automated metrics on custom datasets. Learn the best practices for choosing the right LLM for your specific project and see how they perform on various tasks.

👍 Don't Forget to Like, Comment, and Subscribe for More Tutorials!

00:00 - Intro
01:10 - LLM evaluation approaches
05:36 - Available tools & metrics
08:04 - Evaluation process
08:55 - Google Colab setup
09:49 - Dataset
11:25 - Generate model predictions
12:50 - Naive evaluation
14:55 - Use AI to evaluate AI
19:00 - Evaluation report
21:14 - Conclusion

Join this channel to get access to the perks and support my work:

#rag #llama3 #llm #langchain #python #artificialintelligence

Рекомендации по теме

Комментарии

Excellent explanation and presentation! Well done Sr!

Cyberspider

Perfect. Content like this reminds me subscribing to your channel and website was a good decision. I'm curious to dig into the prompt they used for their metrics. I checked their documentation, and it allows defining custom metrics. That is something really useful, I will share your video in my X (twitter) account, thx.

unclecode

Evaluate LLM Systems & RAGs: Choose the Best LLM Using Automatic Metrics on Your Dataset

How to evaluate an LLM-powered RAG application automatically.

Evaluate LLMs - RAG

Evaluate LLM Systems & RAGs: Choose the Best LLM Using Automatic Metrics on Your Dataset

RAG Time! Evaluate RAG with LLM Evals and Benchmarking

Learn to Evaluate LLMs and RAG Approaches

Session 7: RAG Evaluation with RAGAS and How to Improve Retrieval

Evaluating LLMs using Langchain

Evaluating the Output of Your LLM (Large Language Models): Insights from Microsoft & LangChain

AI Agent Evaluation with RAGAS

LangSmith Tutorial - LLM Evaluation for Beginners

RAGAS: How to Evaluate a RAG Application Like a Pro for Beginners

How Large Language Models Work

Evaluating LLM-based Applications

Building Production-Ready RAG Applications: Jerry Liu

Evaluating RAG Applications #ai #llm

How to Build, Evaluate, and Iterate on LLM Agents

Optimization of LLM Systems with DSPy and LangChain/LangSmith

LangChain 'RAG Evaluation' Webinar

LLM Explained | What is LLM

Developing and Serving RAG-Based LLM Applications in Production

Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

What is Prompt Tuning?

25 LLM tested as AGENTS for our Chains: CoT, Reasoning, ...

Testing Framework Giskard for LLM and RAG Evaluation (Bias, Hallucination, and More)