Mitigating LLM Hallucinations with a Metrics-First Evaluation Framework

preview_player
Показать описание
Join in on this workshop where we will showcase some powerful metrics to evaluate the quality of the inputs and outputs with a focus on both RAG and fine-tuning use cases. In the context of LLMs, “hallucination” refers to a phenomenon where the model generates text that is incorrect, nonsensical, or not real. Since LLMs are not databases or search engines, they would not cite where their response is based on. These models generate text as an extrapolation from the prompt you provided.

What attendees can expect to takeaway from the workshop:

-Deep dive into research-backed metrics to evaluate the quality of the inputs (data quality, RAG context quality, etc) and outputs (hallucinations) while building LLM powered applications.
-Evaluation and experimentation framework while prompt engineering with RAG, as well as while fine-tuning with your own data
-Demo led practical guide to building guardrails and mitigating hallucinations while building LLM powered applications

To access the slides, please click here:

To read the academic paper, please click here:

This event is inspired by DeepLearning.AI’s GenAI short courses, created in collaboration with AI companies across the globe. Our courses help you learn new skills, tools, and concepts efficiently within 1 hour.

About Galileo
At Galileo we are building the first algorithm-powered LLMOps Platform for the enterprise. Galileo provides ML teams with an intelligent ML data bench to collaboratively improve data quality across their model workflows – from pre-training, to post-production. Galileo is currently powering ML teams across the Fortune 500 as well as startups across multiple industries.

Speakers:

Vikram Chatterji, Co-founder and CEO at Galileo

Atindriyo Sanyal, Co-founder and CTO at Galileo
Рекомендации по теме
Комментарии
Автор

The real contribution seems to be the prompt they used to generate the CoT and the metric value... Could you share the code used for the metric and the prompt for ChatPGT?

MMSS-eo
Автор

Thank you for the presentation and demo!

HonestGraduate
Автор

The paper and the Slides are both in the description, guys. :) read.

KokkeOP
Автор

Nice talk! Could you please share the notebook?

MMSS-eo
Автор

Do you think human intervention in the evaluation process is going to last? It seems its a process that LLMs could achieve by themselves in the near future.

danteblink
Автор

Guys would u be able to drop the notebook please?

zaursamedov
Автор

Could someone share the link to the paper that was mentioned here "ChainPoll", I believe.

komalmistry
Автор

I don't know how bt I searched the n word and it came up

davidvilla