filmov
tv
Scaling Up “Vibe Checks” for LLMs - Shreya Shankar | Stanford MLSys #97
Показать описание
Episode 97 of the Stanford MLSys Seminar Series!
Scaling Up “Vibe Checks” for LLMs
Speaker: Shreya Shankar
Bio:
Shreya Shankar is a PhD student in computer science at UC Berkeley, advised by Dr. Aditya Parameswaran. Her research focuses on addressing data challenges in production machine learning pipelines through a human-centered approach. Her work has appeared in top database and human-computer interaction venues like VLDB, SIGMOD, CIDR, and CSCW. She is a recipient of the NDSEG Fellowship and co-organizes the DEEM workshop at SIGMOD, which focuses on data management in end-to-end machine learning.
Abstract:
Large language models (LLMs) are increasingly being used to write custom pipelines that repeatedly process or generate data of some sort. Despite their usefulness, LLM pipelines often produce errors, typically identified through manual “vibe checks” by developers. This talk explores automating this process using evaluation assistants, presenting a method for automatically generating assertions and an interface to help developers iterate on assertion sets. We share takeaways from a deployment with LangChain, where we auto-generated assertions for 2000+ real-world LLM pipelines. Finally, we discuss insights from a qualitative study of how 9 engineers use evaluation assistants: we highlight the subjective nature of "good" assertions and how they adapt over time with changes in prompts, data, LLMs, and pipeline components.
--
Stanford MLSys Seminar hosts: Avanika Narayan, Benjamin Spector, Michael Zhang
Twitter:
--
#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford
Scaling Up “Vibe Checks” for LLMs
Speaker: Shreya Shankar
Bio:
Shreya Shankar is a PhD student in computer science at UC Berkeley, advised by Dr. Aditya Parameswaran. Her research focuses on addressing data challenges in production machine learning pipelines through a human-centered approach. Her work has appeared in top database and human-computer interaction venues like VLDB, SIGMOD, CIDR, and CSCW. She is a recipient of the NDSEG Fellowship and co-organizes the DEEM workshop at SIGMOD, which focuses on data management in end-to-end machine learning.
Abstract:
Large language models (LLMs) are increasingly being used to write custom pipelines that repeatedly process or generate data of some sort. Despite their usefulness, LLM pipelines often produce errors, typically identified through manual “vibe checks” by developers. This talk explores automating this process using evaluation assistants, presenting a method for automatically generating assertions and an interface to help developers iterate on assertion sets. We share takeaways from a deployment with LangChain, where we auto-generated assertions for 2000+ real-world LLM pipelines. Finally, we discuss insights from a qualitative study of how 9 engineers use evaluation assistants: we highlight the subjective nature of "good" assertions and how they adapt over time with changes in prompts, data, LLMs, and pipeline components.
--
Stanford MLSys Seminar hosts: Avanika Narayan, Benjamin Spector, Michael Zhang
Twitter:
--
#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford
Комментарии