Evaluating LLM-based Applications // Josh Tobin // LLMs in Prod Conference Part 2

Показать описание

This portion is sponsored by Gantry.
A simple, powerful SDK for model instrumentation
Gantry's SDK gives you easy access to all of your production data and metrics, just by adding a few lines of code.

//Abstract
Evaluating LLM-based applications can feel like more of an art than a science. In this workshop, we'll give a hands-on introduction to evaluating language models. You'll come away with knowledge and tools you can use to evaluate your own applications, and answers to questions like:

Where do I get evaluation data from, anyway?
Is it possible to evaluate generative models in an automated way? What metrics can I use?
What's the role of human evaluation?

//Bio