filmov
tv
Reward Hacking in AI

Показать описание
Just like humans, artificially intelligent agents also strive to maximize their reward. Both humans and AI systems can get very good at gaming the system by finding loopholes.
0:00 Intro
0:19 Standardized Tests and Campbell's Law
1:04 Job Interviews
1:34 Academic Metrics
2:12 Reward Hacking in Artificial Intelligence
3:05 Reward Functions and Reward Shaping
3:58 Cobra Effect
4:32 Reward Tampering
5:17 Unforeseen Consequences
5:35 Outro
Related Articles:
Goodhart’s Law: Are Academic Metrics Being Gamed?
Faulty Reward Functions in the Wild
Learning Montezuma’s Revenge from a Single Demonstration
0:00 Intro
0:19 Standardized Tests and Campbell's Law
1:04 Job Interviews
1:34 Academic Metrics
2:12 Reward Hacking in Artificial Intelligence
3:05 Reward Functions and Reward Shaping
3:58 Cobra Effect
4:32 Reward Tampering
5:17 Unforeseen Consequences
5:35 Outro
Related Articles:
Goodhart’s Law: Are Academic Metrics Being Gamed?
Faulty Reward Functions in the Wild
Learning Montezuma’s Revenge from a Single Demonstration
Reward Hacking in LLMs Explained
Reward Hacking in AI
Reward Hacking: Concrete Problems in AI Safety Part 3
Reward Hacking in Reinforcement Learning
Specification Gaming: How AI Can Turn Your Wishes Against You
What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4
9 Examples of Specification Gaming
AI Systems Acting in Naughty Ways - Reward Hacking | 2024 Science Ambassador Scholarship Application
🤥 Reasoning Models: Faithfulness of Chain-of-Thought and Reward Hacking | Podcast Ep 2 - NotebookML...
Reward Hacking by Reasoning Models & Loss of Control Scenarios w/ Jeffrey Ladish, from FLI Podca...
Cheating LLMs & How (Not) To Stop Them | OpenAI Paper Explained
Reward Hacking Skit
ISTQB AI Tester | Ethic of AI Sytems | Side Effects in AI | Reward Hacking in AI | AI Tutorials
Richard Sutton - RL agents and reward hacking
Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5
Introduction to Reward Hacking | The Journey of researching on making AI morally conscious
Obfuscation and AI Reward Hacking #ai #data #machinelearning Baker 2025
Google DeepMind's New AI Game Changer WARM (SHOCKING)
[28/34] Reward Hacking - GoodHart's Law
Unmasking Myths: Harnessing the Power of Reward-Driven AI with Reinforcement Learning
Reward hacking
Bug Bounty expectations vs Reality 😂🔥
Reinforcement Learning in DeepSeek-R1 | Visually Explained
🎯 What Are Reward Functions in RFT? (And Why They’re a Game-Changer for LLM Training)
Комментарии