filmov
tv
Model Alignment at Scale using RL from AI Feedback on Databricks

Показать описание
Refining large language models to meet specific business objectives can be challenging. Traditional techniques such as on-the-fly tuning and supervised fine-tuning often fail to adapt LLMs to unique requirements, such as adherence to a strict code of conduct or serving niche markets. To address this, we'll show how Reinforcement Learning from AI Feedback (RLAIF) can be applied on Databricks using an open LLM as a reward model, minimizing the need for extensive human intervention in the ranking of outputs. In our session, we'll explore the structure of RLAIF, its practical use, and its advantages over traditional RLHF, including cost efficiency and operational simplicity. We'll back up our discussion with a demo showing how RLAIF effectively aligns LLMs with business-specific requirements in a simple use case. We'll conclude the session by summarizing the key takeaways and offering a perspective on the future of model alignment at scale.
Talk By: Michael Shtelma, Lead Specialist Solutions Architect, Databricks
Here's more to explore:
Talk By: Michael Shtelma, Lead Specialist Solutions Architect, Databricks
Here's more to explore: