The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

Показать описание

This "Alignment" thing turns out to be even harder than we thought.

# Links

# Referenced Videos

# Other Media
The Simpsons Season 5 Episode 19: "Sweet Seymour Skinner's Baadasssss Song"

With thanks to my excellent Patreon supporters:
- Timothy Lillicrap
- Gladamas
- James
- Scott Worley
- Chad Jones
- Shevis Johnson
- JJ Hepboin
- Pedro A Ortega
- Said Polat
- Chris Canal
- Jake Ehrlich
- Kellen lask
- Francisco Tolmasky
- Michael Andregg
- David Reid
- Peter Rolf
- Teague Lasser
- Andrew Blackledge
- Frank Marsman
- Brad Brookshire
- Cam MacFarlane
- Jason Hise
- Phil Moyer
- Erik de Bruijn
- Alec Johnson
- Clemens Arbesser
- Ludwig Schubert
- Allen Faure
- Eric James
- Matheson Bayley
- Qeith Wreid
- jugettje dutchking
- Owen Campbell-Moore
- Atzin Espino-Murnane
- Johnny Vaughan
- Jacob Van Buren
- Jonatan R
- Ingvi Gautsson
- Michael Greve
- Tom O'Connor
- Laura Olds
- Jon Halliday
- Paul Hobbs
- Jeroen De Dauw
- Lupuleasa Ionuț
- Cooper Lawton
- Tim Neilson
- Eric Scammell
- Igor Keller
- Ben Glanton
- anul kumar sinha
- Duncan Orr
- Will Glynn
- Tyler Herrmann
- Tomas Sayder
- Ian Munro
- Joshua Davis
- Jérôme Beaulieu
- Nathan Fish
- Taras Bobrovytsky
- Jeremy
- Vaskó Richárd
- Benjamin Watkin
- Sebastian Birjoveanu
- Andrew Harcourt
- Luc Ritchie
- Nicholas Guyett
- James Hinchcliffe
- 12tone
- Oliver Habryka
- Chris Beacham
- Zachary Gidwitz
- Nikita Kiriy
- Parker
- Andrew Schreiber
- Steve Trambert
- Mario Lois
- Abigail Novick
- Сергей Уваров
- Bela R
- Mink
- Fionn
- Dmitri Afanasjev
- Marcel Ward
- Andrew Weir
- Kabs
- Miłosz Wierzbicki
- Tendayi Mawushe
- Jake Fish
- Wr4thon
- Martin Ottosen
- Robert Hildebrandt
- Poker Chen
- Kees
- Darko Sperac
- Paul Moffat
- Robert Valdimarsson
- Marco Tiraboschi
- Michael Kuhinica
- Fraser Cain
- Robin Scharf
- Klemen Slavic
- Patrick Henderson
- Oct todo22
- Melisa Kostrzewski
- Hendrik
- Daniel Munter
- Alex Knauth
- Kasper
- Ian Reyes
- James Fowkes
- Tom Sayer
- Len
- Alan Bandurka
- Ben H
- Simon Pilkington
- Daniel Kokotajlo
- Peter Hozák
- Diagon
- Andreas Blomqvist
- Bertalan Bodor
- David Morgan
- Zannheim
- Daniel Eickhardt
- lyon549
- Ihor Mukha
- 14zRobot
- Ivan
- Jason Cherry
- Igor (Kerogi) Kostenko
- ib_
- Thomas Dingemanse
- Stuart Alldritt
- Alexander Brown
- Devon Bernard
- Ted Stokes
- James Helms
- Jesper Andersson
- DeepFriedJif
- Chris Dinant
- Raphaël Lévy
- Johannes Walter
- Matt Stanton
- Garrett Maring
- Anthony Chiu
- Ghaith Tarawneh
- Julian Schulz
- Stellated Hexahedron
- Caleb
- Scott Viteri
- Conor Comiconor
- Michael Roeschter
- Georg Grass
- Isak
- Matthias Hölzl
- Jim Renney
- Edison Franklin
- Piers Calderwood
- Krzysztof Derecki
- Mikhail Tikhomirov
- Richard Otto
- Matt Brauer
- Jaeson Booker
- Mateusz Krzaczek
- Artem Honcharov
- Michael Walters
- Tomasz Gliniecki
- Mihaly Barasz
- Mark Woodward
- Ranzear
- Neil Palmere
- Rajeen Nabid
- Christian Epple
- Clark Schaefer
- Olivier Coutu
- Iestyn bleasdale-shepherd
- MojoExMachina
- Marek Belski
- Luke Peterson
- Eric Eldard
- Eric Rogstad
- Eric Carlson
- Caleb Larson
- Braden Tisdale
- Max Chiswick
- Aron
- David de Kloet
- Sam Freedo
- slindenau
- A21
- Rodrigo Couto
- Johannes Lindmark
- Nicholas Turner
- Tero K

Robert Miles AI Safety

Рекомендации по теме

Комментарии

This reminds me of a story. My father was very strict, and would punish me for every perceived misstep of mine. He believed that this would "optimize" me towards not making any more missteps, but what it really did is optimize me to get really good at hiding missteps. After all, if he never catches a misstep of mine, then I won't get punished, and I reach my objective.

MechMK

"Ok, I'll do the homework, but when I grow up, I'll buy all the toys and play all day long!" - some AI

umblapag

Mesa Optimizer: "I have determined the best way to achieve the Mesa Objective is to build an Optimizer"

EDoyl

I think one of the many benefits of studying AI is how much it's teaching us about human behaviour.

egodreas

At the start of the video, I was keen to suggest that maybe the first thing we should get AI to do is to comprehend the totality of human ethics, then it will understand our objectives in the way we understand them. At the end of the video, I realised that the optimal strategy for the AI, when we do this, is to pretend to have comprehended the totality of human ethics, just so as to escape the classroom.

AtomicShrimp

"It's... alignment problems all the way down"

KilgoreTroutAsf

The first thought that came to mind when I finished the video is how criminals/patients/addicts would fake a result that their control person wants to see only to go back on it as soon as they are released from that environment. It's a bit frightening to think that if humans can outsmart humans with relative ease what a true AI could do.

OnlineMasterPlayer

13:13 _"... but it's learned to want the wrong thing."_
like, say, humans and sugar?

thoperSought

"Just solving the outer alignment problem might not be enough."

Isn't this what basically happens when people go to therapy but have a hard time changing their behaviour?
Because they clearly can understand how a certain behaviour has a negative impact on their lives (they're going to therapy in the first place), and yet they can't seem to be able to get rid of it.
They have solved the outer alignment problem but not the inner alignment one.

Emanuel-sla-hi

It is also interesting to think about this problem in the context of organizations. When organization is trying to "optimize" employee's performance by introducing KPIs in order to be "more objective" and "easier to measure", it actually gives mesa-optimizers (employees) an utility function (mesa-objective) that is guaranteed to be misaligned with base objective.

stick

"When I read this paper I was shocked that such a major issue was new to me. What other big classes of problems have we just... not though of yet?"
Terrifying is the word. I too had completely missed this problem, and fuck me it's a unit. There's no preventing unknown unknowns, knowing this we need to work on AI safety even harder.

Xartab

Base optimizer: Educate people on the safety issues of AI
Mesa-optimizer: Make a do-do joke

doodlebobascending

"Deceptive misaligned mesa-optimiser" - got to throw that randomly into my conversation today! Or maybe print it on a T-Shirt. :-)

Jimbaloidatron

This video should be tagged with [don't put in any AI training datasets]

liamkeough

"Plants follow simple rules"
*laughs in we don't even completely understand the mechanisms controlling stomatal aperture yet, while shoots are a thousand times easier to study than roots"

Fluxquark

Once you started talking about gradient descent finding the Wikipedia article on ethics and pointing to it, I thought the punchline of that example would be the mesa-optimizer figuring out how to edit that article.

MebRappa

Sorry I couldn't join the Discord chat. Just wanted to say that this presentation did a good job of explaining a complex idea. It certainly gave me something to chew on. The time it takes to do these is appreciated.

dwreid

Let's call it a mesa-optimizer because calling it a suboptimizer is suboptimal.

mukkor

As I watched your channel
I thought "alignment problem is hard but very competent people are working on it"
I watched this latest video
I thought "that AI stuff is freakish hardcore"

sylvainprigent

Now we add a third optimizer to maximize the alignment and call it metaoptimizer. This system is guaranteed to maximize confusion!

cmilkau

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

The Alignment Problem: Machine Learning and Human Values with Brian Christian

What happens if AI alignment goes wrong, explained by Gilfoyle of Silicon valley.

Sam Altman: The Alignment Problem

The AI Alignment Problem, Explained

Eliezer Yudkowsky on why AI Alignment is Impossible

How to solve AI alignment problem | Elon Musk and Lex Fridman

#94 - ALAN CHAN - AI Alignment and Governance #NEURIPS

The AI Conversation Podcast: The AI Control Problem, Can We Handle Superintelligence? #51

Buddhism and the AI 'Alignment Problem'

Stanford CS221 I The AI Alignment Problem: Reward Hacking & Negative Side Effects I 2023

The Value Alignment Problem in AI Explained Simply...

We Were Right! Real Inner Misalignment

What is AI Alignment and Why is it Important?

Future Danger of AI Alignment Problems | Lex Fridman & Eliezer Yudkowsky #lexfridman #ai #techno...

AI says why it will kill us all. Experts agree.

OpenAI Sam Altman on the 'alignment problem' in AI right now

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

What if Ai Alignment problem is solved? #shorts

Mindfulness for Computers? Buddhist Practice and the AI 'Alignment Problem'

AI Alignment Problem: Extremes of Optimization

AI, Ethics, and the Value Alignment Problem with Meia Chita-Tegmark and Lucas Perry

The Alignment Problem - Brian Christian

AI 'Stop Button' Problem - Computerphile