Tailor-Made LLM Evaluations: How to Create Custom Evaluations for Your LLM - Linoy Cohen

Показать описание

In the ever-changing world of Generative AI, new LLMs are being released on a daily basis, and while there are standardized scoring approaches for evaluating them, they don't always evaluate based on what is important to us. In this talk, we will go over the two main approaches to evaluate LLMs - Benchmarking and LLM-as-a-judge. We will discuss which one to choose and how to create custom evaluations that suit our own use cases. Lastly, we will go over a set of best practices on how to create the best possible evaluation that produces an objective and deterministic score.

Linoy Cohen is a Data Scientist at Intuit in the NLP team. As part of her job, she is responsible for creating automatic evaluations for LLM’s that provide an objective method to measure the capabilities of LLM’s based on specific custom criteria and needs.

hayaData 2024, Tel-Aviv, Israel