filmov
tv
Evaluating Language Models // Matthew Sharp // AI in Production Conference Lightning Talk
Показать описание
// Abstract
Matt talks about the challenges of evaluating language models, as well as how to address them, what metrics you can use, and the datasets available. Discuss difficulties of continuous evaluation in production and common pitfalls. Takeaways: A call to action to contribute to public evaluation datasets and a more concerted effort from the community to reduce harmful bias.
// Bio
Author of LLMs in Production, through Manning Publications. Has worked in ML/AI for over ten years working on building machine learning platforms for start up and large tech companies alike. My career has focused mainly on deploying models to production.
A big thank you to our Premium Sponsors, @Databricks and @baseten for their generous support!
// Sign up for our Newsletter to never miss an event:
// Watch all the conference videos here:
// Read our blog:
// Join an in-person local meetup near you:
// MLOps Swag/Merch:
// Follow us on Twitter:
//Follow us on Linkedin:
Matt talks about the challenges of evaluating language models, as well as how to address them, what metrics you can use, and the datasets available. Discuss difficulties of continuous evaluation in production and common pitfalls. Takeaways: A call to action to contribute to public evaluation datasets and a more concerted effort from the community to reduce harmful bias.
// Bio
Author of LLMs in Production, through Manning Publications. Has worked in ML/AI for over ten years working on building machine learning platforms for start up and large tech companies alike. My career has focused mainly on deploying models to production.
A big thank you to our Premium Sponsors, @Databricks and @baseten for their generous support!
// Sign up for our Newsletter to never miss an event:
// Watch all the conference videos here:
// Read our blog:
// Join an in-person local meetup near you:
// MLOps Swag/Merch:
// Follow us on Twitter:
//Follow us on Linkedin: