We Were Right! Real Inner Misalignment

preview_player
Показать описание
Researchers ran real versions of the thought experiments in the 'Mesa-Optimisers' videos!
What they found won't shock you (if you've been paying attention)

Previous videos on the subject:

- Gladamas
- Timothy Lillicrap
- Kieryn
- AxisAngles
- James
- Jake Fish
- Scott Worley
- James Kirkland
- James E. Petts
- Chad Jones
- Shevis Johnson
- JJ Hepboin
- Pedro A Ortega
- Clemens Arbesser
- Said Polat
- Chris Canal
- Jake Ehrlich
- Kellen lask
- Francisco Tolmasky
- Michael Andregg
- David Reid
- Peter Rolf
- Teague Lasser
- Andrew Blackledge
- Brad Brookshire
- Cam MacFarlane
- Craig Mederios
- Jon Wright
- CaptObvious
- Brian Lonergan
- Girish Sastry
- Jason Hise
- Phil Moyer
- Erik de Bruijn
- Alec Johnson
- Ludwig Schubert
- Eric James
- Matheson Bayley
- Qeith Wreid
- jugettje dutchking
- James Hinchcliffe
- Atzin Espino-Murnane
- Carsten Milkau
- Jacob Van Buren
- Jonatan R
- Ingvi Gautsson
- Michael Greve
- Tom O'Connor
- Laura Olds
- Jon Halliday
- Paul Hobbs
- Jeroen De Dauw
- Cooper Lawton
- Tim Neilson
- Eric Scammell
- Igor Keller
- Ben Glanton
- Tor Barstad
- Duncan Orr
- Will Glynn
- Tyler Herrmann
- Ian Munro
- Jérôme Beaulieu
- Nathan Fish
- Peter Hozák
- Taras Bobrovytsky
- Jeremy
- Vaskó Richárd
- Benjamin Watkin
- Andrew Harcourt
- Luc Ritchie
- Nicholas Guyett
- 12tone
- Oliver Habryka
- Chris Beacham
- Nikita Kiriy
- Andrew Schreiber
- Steve Trambert
- Braden Tisdale
- Abigail Novick
- Serge Var
- Mink
- Chris Rimmer
- Edmund Fokschaner
- April Clark
- J
- Nate Gardner
- John Aslanides
- Mara
- ErikBln
- DragonSheep
- Richard Newcombe
- Joshua Michel
- P
- Alex Doroff
- BlankProgram
- Richard
- David Morgan
- Fionn
- Dmitri Afanasjev
- Marcel Ward
- Andrew Weir
- Kabs
- Ammar Mousali
- Miłosz Wierzbicki
- Tendayi Mawushe
- Wr4thon
- Martin Ottosen
- Andy K
- Kees
- Darko Sperac
- Robert Valdimarsson
- Marco Tiraboschi
- Michael Kuhinica
- Fraser Cain
- Robin Scharf
- Klemen Slavic
- Patrick Henderson
- Hendrik
- Daniel Munter
- Alex Knauth
- Kasper
- Ian Reyes
- James Fowkes
- Tom Sayer
- Len
- Alan Bandurka
- Ben H
- Simon Pilkington
- Daniel Kokotajlo
- Yuchong Li
- Diagon
- Andreas Blomqvist
- Iras
- Qwijibo (James)
- Zubin Madon
- Zannheim
- Daniel Eickhardt
- lyon549
- 14zRobot
- Ivan
- Jason Cherry
- Igor (Kerogi) Kostenko
- ib_
- Thomas Dingemanse
- Stuart Alldritt
- Alexander Brown
- Devon Bernard
- Ted Stokes
- Jesper Andersson
- DeepFriedJif
- Chris Dinant
- Raphaël Lévy
- Johannes Walter
- Matt Stanton
- Garrett Maring
- Anthony Chiu
- Ghaith Tarawneh
- Julian Schulz
- Stellated Hexahedron
- Caleb
- Clay Upton
- Conor Comiconor
- Michael Roeschter
- Georg Grass
- Isak Renström
- Matthias Hölzl
- Jim Renney
- Edison Franklin
- Piers Calderwood
- Mikhail Tikhomirov
- Matt Brauer
- Mateusz Krzaczek
- Artem Honcharov
- Tomasz Gliniecki
- Mihaly Barasz
- Mark Woodward
- Ranzear
- Neil Palmere
- Rajeen Nabid
- Clark Schaefer
- Olivier Coutu
- Iestyn bleasdale-shepherd
- MojoExMachina
- Marek Belski
- Luke Peterson
- Eric Rogstad
- Eric Carlson
- Caleb Larson
- Max Chiswick
- Aron
- Sam Freedo
- slindenau
- Johannes Lindmark
- Nicholas Turner
- Intensifier
- Valerio Galieni
- FJannis
- Grant Parks
- Ryan W Ammons
- This person's name is too hard to pronounce
- contalloomlegs
- Everardo González Ávalos
- Knut Løklingholm
- Andrew McKnight
- Andrei Trifonov
- Aleks D
- Mutual Information
- Tim
- A Socialist Hobgoblin
- Bren Ehnebuske
- Martin Frassek
- Sven Drebitz
- Quabl
- Valentin Mocanu
- Philip Crawford
- Matthew Shinkle
- Robby Gottesman
- Juanchi

Рекомендации по теме
Комментарии
Автор

Turns out the Terminator wasn’t programmed to kill Sarah Connor after all, it just wanted clothes, boots and a motorcycle.

llucos
Автор

AI safety researchers are absolutely the last people on earth you want to hear "We were right" from.

vwabi
Автор

10:54 "It actually wants something else, and it's capable enough to get it."
Yeah, that _is_ worse.

ShankarSivarajan
Автор

Famous last words for species right before they hit the great filter: "Yo, in the test runs, did paperclips max out on the positive attribution heat map, too?"

unvergebeneid
Автор

Robert Miles: "We were right"
Me: Oh no
"About inner misalignment"
OH NO

roflrofl
Автор

Almost sounds like AIs will need psychologists, too.
"So I tried to acquire that wall..."
"Why not the coin? What is it about the wall that attracts you?"
"Well, in training, I always went to the... oh...huh, never thought about it that way."

bierrollerful
Автор

A coin isn't a coin unless it occurs at the edge of the map! We may think the AI is weird for ignoring the heretical middle-of-the-map coin, but that's just our object recognition biases showing.

proskub
Автор

Let alone simple AI, _people_ get misaligned like that quite often - hoarding is one good example, which happens both in real life and in games like with those keys.

SummerSong
Автор

9:00 "We developed interpretability tools to see why programs fail!" "What's going on when they fail?" "Dunno."

No shade, interpretability is hard, even for simple AI :P

charliesteiner
Автор

Somehow the terminal and instrumental goals talk made me correlate the AI with us.
As a financial advisor, I have found that many people also made this mistake that money is an instrumental goal, but having spend so much time working to get money, people start to think that money is their terminal goal so much so that they spend their entire live looking for money forgetting why they want to have the money in the first place.

goonerOZZ
Автор

Nothing more terrifying than seeing the title 'We Were Right!' on a Robert Miles video.

bartman
Автор

Looking at my hoard of keypicks in skyrim, i can confirm, that this is perfectly human behavior.

moartems
Автор

I feel like this isn't just a problem with artificial intelligence but intelligence in general. Biological intelligence seems to mismatch terminal goals and instrumental goals all the time like Pavlovian conditioning training a dog to salivate when recognizing a bell ringing(what should be the instrumental goal) or humans trading away happiness and well being (what should be the terminal goal) for money (what should be an instrumental goal).

RichardEntzminger
Автор

This is starting to get an "unsolvable problem" vibe. Like we are somehow thinking about this in the wrong way and current solutions aren't really making good progress.

Practicality
Автор

imagine a future where a very trusted ai agent seems to be fantastically doing its job well for many months or years, and then suddenly goes haywire since its objective was wrong but it just hadn't encountered a circumstance were that error was made apparent. then tragedy!

-
Автор

Can't wait for the "We Were Right! Real Misaligned General Superintelligence" video

Turtlerus
Автор

"Can you spot the difference?"
Pauses the video and looking for the difference....nothing. Unpause.
"You can pause the video."
Pauses again and manically looking for a pattern. More keys?
"There's more keys in the deployment. Have you spotted it?"
Yes!!!!

YuureiInu
Автор

Imagine training a self driving car in a simulation where plastic bags are always gray and children always wear blue. It then happily runs down a child wearing gray, before slamming on the brakes and throwing the unbuckled passengers through the windshield, for a blue bag on the road.

Houshalter
Автор

The thought of creating a capable agent with the wrong goals is terrifying, actually; and yes, an agent being bad at doing something good is absolutely a problem much preferable to an agent being good at doing something bad.

thecakeredux
Автор

This is one of your clearest and most interesting videos to date. I'm now very excited for the interpretability video!

andrewweirny