Reducing Hallucinations and Evaluating LLMs for Production - Divyansh Chaurasia, Deepchecks

Показать описание

Reducing Hallucinations and Evaluating LLMs for Production - Divyansh Chaurasia, Deepchecks

This talk focuses on the challenges associated with evaluating LLMs and hallucinations in the LLM outputs. Evaluating the performance and capabilities of LLMs remains a challenging task due to their size, complexity, and inherent biases. It will cover traditional evaluation methods like BLEU, F1 scores, & human evaluation and compare them to modern methods like Eleuther’s AI evaluation framework and showcase open-source LLM validation modules.

This talk will also focus on how to reduce hallucinations in LLM outputs, we will be examining the underlying reasons behind LLM hallucinations, including biases in training data, overfitting, and the lack of explicit fact-checking mechanisms.

The talk will cover the mechanism, open-source frameworks, and best practices by which we can minimize LLM Hallucinations and make them ready for production.

This talk is intended for practitioners, researchers, and enthusiasts in the LLMs space with a basic understanding of language models.