Hallucination-Free LLMs: Strategies for Monitoring and Mitigation

preview_player
Показать описание
The talk will cover why and how to monitor LLMs deployed to production. We will focus on the state-of-the-art solutions for detecting hallucinations, split into two types:
1. Uncertainty Quantification
2. LLM self-evaluation

In the Uncertainty Quantification part, we will discuss algorithms to leverage token probabilities to estimate the quality of model responses. This includes simple accuracy estimation and more advanced methods for estimating Semantic Uncertainty or any classification metric.

In the LLM self-evaluation part, we will cover using (potentially the same) LLM to quantify the quality of the answer. We will also cover state-of-the-art algorithms such as SelfCheckGPT and LLM-eval.

You will build an intuitive understanding of the LLM monitoring methods, their strengths and weaknesses, and learn how to easily set up an LLM monitoring system.

Table of Contents:
00:00 Introduction
2:33 What is LLM Monitoring
8:10 LLM-Based Hallucination Detection: Consistency
12:43 LLM-Based Hallucination Detection: Answer Evaluation
17:12 Output Uncertainty Quantification
23:00 Semantic Uncertainty Quantification
29:10 Experiment Results
----------

👉 Learn more about Data Science Dojo here:

👉 Watch the latest video tutorials here:

👉 See what our past attendees are saying here:
--
At Data Science Dojo, we believe data science is for everyone. Our in-person data science training has been attended by more than 8000+ employees from over 2000+ companies globally, including many leaders in tech like Microsoft, Apple, and Facebook.
--

-----
#ArtificialIntelligence #AI #MachineLearning #DataScience #LargeLanguageModels #llm #hallucinations
Рекомендации по теме
Комментарии
Автор

It's quite obvious to run a second AI agent as a supervisor who will evaluate the answers.
Running a team of AI agents gives significantly better results in every metric and in cases where this makes sense, the more expensive models can instruct less powerful, less expensive models to make it more cost effective.

yarpenzigrin