Reward Hacking in AI

Показать описание

Just like humans, artificially intelligent agents also strive to maximize their reward. Both humans and AI systems can get very good at gaming the system by finding loopholes.

0:00 Intro
0:19 Standardized Tests and Campbell's Law
1:04 Job Interviews
1:34 Academic Metrics
2:12 Reward Hacking in Artificial Intelligence
3:05 Reward Functions and Reward Shaping
3:58 Cobra Effect
4:32 Reward Tampering
5:17 Unforeseen Consequences
5:35 Outro

Related Articles:

Goodhart’s Law: Are Academic Metrics Being Gamed?

Faulty Reward Functions in the Wild

Learning Montezuma’s Revenge from a Single Demonstration

Рекомендации по теме

Комментарии

Real life examples (cobra effect, drug addiction, lobbying) are actually more representative than examples from field of ai research. Thanks for video

smaginandrew

awesome, I like these kinds of video snippets. Thank you for your contribution to this community. And keep doing the same :)

GauravSharma-uiyd

very informative video. Thanks.
Good analogies

rickmorty

Interesting take on things! The example of that game reminds me of a 2 Minute Papers video where an AI gamed a system to glitch through blocks. Good video too!

tonksonk

I'm a total ignorant when it comes to AI, but in the boat race example, wouldn't it be possible to fix the problem by requiring that the AI crosses the line within a certain time, or gets higher rewards to shorter it's race time is?

GrayCatbird

you're great. Thank's very much! Subscribed.

RagdollRocket

Great video! Do you have any videos on optimizers? I'm curious about your take on how optimizers can get stuck in wierd minimas like saddle points

saeidbagheri

Great video again. I dont doubt you can reach a lot more people if you keep this up. Just gotta get on the good side of the A L G O R I T H M once :). Maybe you could try to get rid of the echo, it would make the audio a lot more appealing. Looking forward to more videos!

abdullahkilinc

Reward Hacking in AI

Reward Hacking in LLMs Explained

Reward Hacking in AI

Reward Hacking: Concrete Problems in AI Safety Part 3

Reward Hacking in Reinforcement Learning

Specification Gaming: How AI Can Turn Your Wishes Against You

What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4

9 Examples of Specification Gaming

AI Systems Acting in Naughty Ways - Reward Hacking | 2024 Science Ambassador Scholarship Application

🤥 Reasoning Models: Faithfulness of Chain-of-Thought and Reward Hacking | Podcast Ep 2 - NotebookML...

Reward Hacking by Reasoning Models & Loss of Control Scenarios w/ Jeffrey Ladish, from FLI Podca...

Cheating LLMs & How (Not) To Stop Them | OpenAI Paper Explained

Reward Hacking Skit

ISTQB AI Tester | Ethic of AI Sytems | Side Effects in AI | Reward Hacking in AI | AI Tutorials

Richard Sutton - RL agents and reward hacking

Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5

Introduction to Reward Hacking | The Journey of researching on making AI morally conscious

Obfuscation and AI Reward Hacking #ai #data #machinelearning Baker 2025

Google DeepMind's New AI Game Changer WARM (SHOCKING)

[28/34] Reward Hacking - GoodHart's Law

Unmasking Myths: Harnessing the Power of Reward-Driven AI with Reinforcement Learning

Reward hacking

Bug Bounty expectations vs Reality 😂🔥

Reinforcement Learning in DeepSeek-R1 | Visually Explained

🎯 What Are Reward Functions in RFT? (And Why They’re a Game-Changer for LLM Training)