LLM Evals and LLM as a Judge: Fundamentals

Показать описание

What are LLM evals and how should you use them when productionizing generative AI applications? This rapid-fire technical foray – the first in a series – covers the prevailing ways to evaluate LLM systems, evaluation approaches and metrics for LLM apps – including LLM as a judge, user-provided feedback, golden datasets, and business metrics – and emerging best practices.

To get a copy of the presentation or ask followup questions, please join the Arize community:

0:00 Introduction
0:55 Evaluation Metrics for LLM Applications
1:52 LLM as a Judge
4:02 Types of LLM Evals
4:11 Customizing Evaluations
6:29 Best Practices and Pitfalls