🔥 Live Demo: Reinforcement Fine-Tuning for LLMs — Build Smarter Models with Less Data l Tutorial

preview_player
Показать описание
Tired of labeling thousands of examples just to fine-tune your LLM? There’s a better way — it’s called Reinforcement Fine-Tuning (RFT). 💡

In this hands-on webinar, the Predibase team introduces the first end-to-end RFT platform designed to supercharge LLM customization with minimal data — and maximum control. Whether you're working with open-source LLMs or enterprise AI models, this session will show you how to go from prototype to production using cutting-edge GRPO-based fine-tuning workflows.

👇 What You’ll Learn:
✅ What is Reinforcement Fine-Tuning (RFT) and how it works
✅ When to use RFT vs Supervised Fine-Tuning (SFT)
✅ Real-world use cases: code generation, multi-step reasoning, math tasks
✅ How to write reward functions and dynamically update them
✅ Live demo of RFT training with observability tools
✅ Behind the scenes of Predibase's managed infrastructure (Lorax, GRPO)
✅ Why RFT beats SFT for many modern ML workflows

🔔 Don’t forget to LIKE, COMMENT, and SUBSCRIBE for the latest on LLM fine-tuning, AI scaling, and reinforcement learning hacks!

Special thanks to @DevIntheDetails !

#llm #reinforcementlearning #finetuning #rft #aiinfrastructure #machinelearning #opensourceai #datascience #grpo #Predibase #mlops #LLMTraining #CustomLLM #MLTools #MLEngineering

00:00 - Intro – Why RFT is the Future of LLM Customization
02:30 - Meet the Engineers Behind Predibase RFT
05:00 - What is Reinforcement Fine-Tuning (RFT)?
07:45 - RFT vs SFT – When to Use Each
10:10 - Top Use Cases for RFT: Code, Math, Reasoning
14:20 - How Reward Functions Work in RFT
18:40 - Live Use Case: Function Calling & Model Errors
23:30 - Writing and Updating Reward Functions
28:15 - Live Demo – Training a Model with RFT on Predibase
34:00 - Behind the Scenes of Managed RFT
39:00 - Enterprise-Ready Features of Predibase RFT
42:00 - Live Q&A with the Founders and Engineers
Рекомендации по теме
Комментарии
Автор

in the function calling example that was showcased in the webinar, were only 20 samples enough to achieve 99 % accuracy?

manish