The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment

preview_player
Показать описание
This "Alignment" thing turns out to be even harder than we thought.

# Links

# Referenced Videos

# Other Media
The Simpsons Season 5 Episode 19: "Sweet Seymour Skinner's Baadasssss Song"

With thanks to my excellent Patreon supporters:
- Timothy Lillicrap
- Gladamas
- James
- Scott Worley
- Chad Jones
- Shevis Johnson
- JJ Hepboin
- Pedro A Ortega
- Said Polat
- Chris Canal
- Jake Ehrlich
- Kellen lask
- Francisco Tolmasky
- Michael Andregg
- David Reid
- Peter Rolf
- Teague Lasser
- Andrew Blackledge
- Frank Marsman
- Brad Brookshire
- Cam MacFarlane
- Jason Hise
- Phil Moyer
- Erik de Bruijn
- Alec Johnson
- Clemens Arbesser
- Ludwig Schubert
- Allen Faure
- Eric James
- Matheson Bayley
- Qeith Wreid
- jugettje dutchking
- Owen Campbell-Moore
- Atzin Espino-Murnane
- Johnny Vaughan
- Jacob Van Buren
- Jonatan R
- Ingvi Gautsson
- Michael Greve
- Tom O'Connor
- Laura Olds
- Jon Halliday
- Paul Hobbs
- Jeroen De Dauw
- Lupuleasa Ionuț
- Cooper Lawton
- Tim Neilson
- Eric Scammell
- Igor Keller
- Ben Glanton
- anul kumar sinha
- Duncan Orr
- Will Glynn
- Tyler Herrmann
- Tomas Sayder
- Ian Munro
- Joshua Davis
- Jérôme Beaulieu
- Nathan Fish
- Taras Bobrovytsky
- Jeremy
- Vaskó Richárd
- Benjamin Watkin
- Sebastian Birjoveanu
- Andrew Harcourt
- Luc Ritchie
- Nicholas Guyett
- James Hinchcliffe
- 12tone
- Oliver Habryka
- Chris Beacham
- Zachary Gidwitz
- Nikita Kiriy
- Parker
- Andrew Schreiber
- Steve Trambert
- Mario Lois
- Abigail Novick
- Сергей Уваров
- Bela R
- Mink
- Fionn
- Dmitri Afanasjev
- Marcel Ward
- Andrew Weir
- Kabs
- Miłosz Wierzbicki
- Tendayi Mawushe
- Jake Fish
- Wr4thon
- Martin Ottosen
- Robert Hildebrandt
- Poker Chen
- Kees
- Darko Sperac
- Paul Moffat
- Robert Valdimarsson
- Marco Tiraboschi
- Michael Kuhinica
- Fraser Cain
- Robin Scharf
- Klemen Slavic
- Patrick Henderson
- Oct todo22
- Melisa Kostrzewski
- Hendrik
- Daniel Munter
- Alex Knauth
- Kasper
- Ian Reyes
- James Fowkes
- Tom Sayer
- Len
- Alan Bandurka
- Ben H
- Simon Pilkington
- Daniel Kokotajlo
- Peter Hozák
- Diagon
- Andreas Blomqvist
- Bertalan Bodor
- David Morgan
- Zannheim
- Daniel Eickhardt
- lyon549
- Ihor Mukha
- 14zRobot
- Ivan
- Jason Cherry
- Igor (Kerogi) Kostenko
- ib_
- Thomas Dingemanse
- Stuart Alldritt
- Alexander Brown
- Devon Bernard
- Ted Stokes
- James Helms
- Jesper Andersson
- DeepFriedJif
- Chris Dinant
- Raphaël Lévy
- Johannes Walter
- Matt Stanton
- Garrett Maring
- Anthony Chiu
- Ghaith Tarawneh
- Julian Schulz
- Stellated Hexahedron
- Caleb
- Scott Viteri
- Conor Comiconor
- Michael Roeschter
- Georg Grass
- Isak
- Matthias Hölzl
- Jim Renney
- Edison Franklin
- Piers Calderwood
- Krzysztof Derecki
- Mikhail Tikhomirov
- Richard Otto
- Matt Brauer
- Jaeson Booker
- Mateusz Krzaczek
- Artem Honcharov
- Michael Walters
- Tomasz Gliniecki
- Mihaly Barasz
- Mark Woodward
- Ranzear
- Neil Palmere
- Rajeen Nabid
- Christian Epple
- Clark Schaefer
- Olivier Coutu
- Iestyn bleasdale-shepherd
- MojoExMachina
- Marek Belski
- Luke Peterson
- Eric Eldard
- Eric Rogstad
- Eric Carlson
- Caleb Larson
- Braden Tisdale
- Max Chiswick
- Aron
- David de Kloet
- Sam Freedo
- slindenau
- A21
- Rodrigo Couto
- Johannes Lindmark
- Nicholas Turner
- Tero K
Рекомендации по теме
Комментарии
Автор

This reminds me of a story. My father was very strict, and would punish me for every perceived misstep of mine. He believed that this would "optimize" me towards not making any more missteps, but what it really did is optimize me to get really good at hiding missteps. After all, if he never catches a misstep of mine, then I won't get punished, and I reach my objective.

MechMK
Автор

"Ok, I'll do the homework, but when I grow up, I'll buy all the toys and play all day long!" - some AI

umblapag
Автор

Mesa Optimizer: "I have determined the best way to achieve the Mesa Objective is to build an Optimizer"

EDoyl
Автор

I think one of the many benefits of studying AI is how much it's teaching us about human behaviour.

egodreas
Автор

At the start of the video, I was keen to suggest that maybe the first thing we should get AI to do is to comprehend the totality of human ethics, then it will understand our objectives in the way we understand them. At the end of the video, I realised that the optimal strategy for the AI, when we do this, is to pretend to have comprehended the totality of human ethics, just so as to escape the classroom.

AtomicShrimp
Автор

"It's... alignment problems all the way down"

KilgoreTroutAsf
Автор

The first thought that came to mind when I finished the video is how criminals/patients/addicts would fake a result that their control person wants to see only to go back on it as soon as they are released from that environment. It's a bit frightening to think that if humans can outsmart humans with relative ease what a true AI could do.

OnlineMasterPlayer
Автор

13:13 _"... but it's learned to want the wrong thing."_
like, say, humans and sugar?

thoperSought
Автор

"Just solving the outer alignment problem might not be enough."

Isn't this what basically happens when people go to therapy but have a hard time changing their behaviour?
Because they clearly can understand how a certain behaviour has a negative impact on their lives (they're going to therapy in the first place), and yet they can't seem to be able to get rid of it.
They have solved the outer alignment problem but not the inner alignment one.

Emanuel-sla-hi
Автор

It is also interesting to think about this problem in the context of organizations. When organization is trying to "optimize" employee's performance by introducing KPIs in order to be "more objective" and "easier to measure", it actually gives mesa-optimizers (employees) an utility function (mesa-objective) that is guaranteed to be misaligned with base objective.

stick
Автор

"When I read this paper I was shocked that such a major issue was new to me. What other big classes of problems have we just... not though of yet?"
Terrifying is the word. I too had completely missed this problem, and fuck me it's a unit. There's no preventing unknown unknowns, knowing this we need to work on AI safety even harder.

Xartab
Автор

Base optimizer: Educate people on the safety issues of AI
Mesa-optimizer: Make a do-do joke

doodlebobascending
Автор

"Deceptive misaligned mesa-optimiser" - got to throw that randomly into my conversation today! Or maybe print it on a T-Shirt. :-)

Jimbaloidatron
Автор

This video should be tagged with [don't put in any AI training datasets]

liamkeough
Автор

"Plants follow simple rules"
*laughs in we don't even completely understand the mechanisms controlling stomatal aperture yet, while shoots are a thousand times easier to study than roots"

Fluxquark
Автор

Once you started talking about gradient descent finding the Wikipedia article on ethics and pointing to it, I thought the punchline of that example would be the mesa-optimizer figuring out how to edit that article.

MebRappa
Автор

Sorry I couldn't join the Discord chat. Just wanted to say that this presentation did a good job of explaining a complex idea. It certainly gave me something to chew on. The time it takes to do these is appreciated.

dwreid
Автор

Let's call it a mesa-optimizer because calling it a suboptimizer is suboptimal.

mukkor
Автор

As I watched your channel
I thought "alignment problem is hard but very competent people are working on it"
I watched this latest video
I thought "that AI stuff is freakish hardcore"

sylvainprigent
Автор

Now we add a third optimizer to maximize the alignment and call it metaoptimizer. This system is guaranteed to maximize confusion!

cmilkau