We Were Right! Real Inner Misalignment

Показать описание

Researchers ran real versions of the thought experiments in the 'Mesa-Optimisers' videos!
What they found won't shock you (if you've been paying attention)

Previous videos on the subject:

- Gladamas
- Timothy Lillicrap
- Kieryn
- AxisAngles
- James
- Jake Fish
- Scott Worley
- James Kirkland
- James E. Petts
- Chad Jones
- Shevis Johnson
- JJ Hepboin
- Pedro A Ortega
- Clemens Arbesser
- Said Polat
- Chris Canal
- Jake Ehrlich
- Kellen lask
- Francisco Tolmasky
- Michael Andregg
- David Reid
- Peter Rolf
- Teague Lasser
- Andrew Blackledge
- Brad Brookshire
- Cam MacFarlane
- Craig Mederios
- Jon Wright
- CaptObvious
- Brian Lonergan
- Girish Sastry
- Jason Hise
- Phil Moyer
- Erik de Bruijn
- Alec Johnson
- Ludwig Schubert
- Eric James
- Matheson Bayley
- Qeith Wreid
- jugettje dutchking
- James Hinchcliffe
- Atzin Espino-Murnane
- Carsten Milkau
- Jacob Van Buren
- Jonatan R
- Ingvi Gautsson
- Michael Greve
- Tom O'Connor
- Laura Olds
- Jon Halliday
- Paul Hobbs
- Jeroen De Dauw
- Cooper Lawton
- Tim Neilson
- Eric Scammell
- Igor Keller
- Ben Glanton
- Tor Barstad
- Duncan Orr
- Will Glynn
- Tyler Herrmann
- Ian Munro
- Jérôme Beaulieu
- Nathan Fish
- Peter Hozák
- Taras Bobrovytsky
- Jeremy
- Vaskó Richárd
- Benjamin Watkin
- Andrew Harcourt
- Luc Ritchie
- Nicholas Guyett
- 12tone
- Oliver Habryka
- Chris Beacham
- Nikita Kiriy
- Andrew Schreiber
- Steve Trambert
- Braden Tisdale
- Abigail Novick
- Serge Var
- Mink
- Chris Rimmer
- Edmund Fokschaner
- April Clark
- J
- Nate Gardner
- John Aslanides
- Mara
- ErikBln
- DragonSheep
- Richard Newcombe
- Joshua Michel
- P
- Alex Doroff
- BlankProgram
- Richard
- David Morgan
- Fionn
- Dmitri Afanasjev
- Marcel Ward
- Andrew Weir
- Kabs
- Ammar Mousali
- Miłosz Wierzbicki
- Tendayi Mawushe
- Wr4thon
- Martin Ottosen
- Andy K
- Kees
- Darko Sperac
- Robert Valdimarsson
- Marco Tiraboschi
- Michael Kuhinica
- Fraser Cain
- Robin Scharf
- Klemen Slavic
- Patrick Henderson
- Hendrik
- Daniel Munter
- Alex Knauth
- Kasper
- Ian Reyes
- James Fowkes
- Tom Sayer
- Len
- Alan Bandurka
- Ben H
- Simon Pilkington
- Daniel Kokotajlo
- Yuchong Li
- Diagon
- Andreas Blomqvist
- Iras
- Qwijibo (James)
- Zubin Madon
- Zannheim
- Daniel Eickhardt
- lyon549
- 14zRobot
- Ivan
- Jason Cherry
- Igor (Kerogi) Kostenko
- ib_
- Thomas Dingemanse
- Stuart Alldritt
- Alexander Brown
- Devon Bernard
- Ted Stokes
- Jesper Andersson
- DeepFriedJif
- Chris Dinant
- Raphaël Lévy
- Johannes Walter
- Matt Stanton
- Garrett Maring
- Anthony Chiu
- Ghaith Tarawneh
- Julian Schulz
- Stellated Hexahedron
- Caleb
- Clay Upton
- Conor Comiconor
- Michael Roeschter
- Georg Grass
- Isak Renström
- Matthias Hölzl
- Jim Renney
- Edison Franklin
- Piers Calderwood
- Mikhail Tikhomirov
- Matt Brauer
- Mateusz Krzaczek
- Artem Honcharov
- Tomasz Gliniecki
- Mihaly Barasz
- Mark Woodward
- Ranzear
- Neil Palmere
- Rajeen Nabid
- Clark Schaefer
- Olivier Coutu
- Iestyn bleasdale-shepherd
- MojoExMachina
- Marek Belski
- Luke Peterson
- Eric Rogstad
- Eric Carlson
- Caleb Larson
- Max Chiswick
- Aron
- Sam Freedo
- slindenau
- Johannes Lindmark
- Nicholas Turner
- Intensifier
- Valerio Galieni
- FJannis
- Grant Parks
- Ryan W Ammons
- This person's name is too hard to pronounce
- contalloomlegs
- Everardo González Ávalos
- Knut Løklingholm
- Andrew McKnight
- Andrei Trifonov
- Aleks D
- Mutual Information
- Tim
- A Socialist Hobgoblin
- Bren Ehnebuske
- Martin Frassek
- Sven Drebitz
- Quabl
- Valentin Mocanu
- Philip Crawford
- Matthew Shinkle
- Robby Gottesman
- Juanchi

Robert Miles AI Safety

Рекомендации по теме

Комментарии

Turns out the Terminator wasn’t programmed to kill Sarah Connor after all, it just wanted clothes, boots and a motorcycle.

llucos

AI safety researchers are absolutely the last people on earth you want to hear "We were right" from.

vwabi

10:54 "It actually wants something else, and it's capable enough to get it."
Yeah, that _is_ worse.

ShankarSivarajan

Famous last words for species right before they hit the great filter: "Yo, in the test runs, did paperclips max out on the positive attribution heat map, too?"

unvergebeneid

Robert Miles: "We were right"
Me: Oh no
"About inner misalignment"
OH NO

roflrofl

Almost sounds like AIs will need psychologists, too.
"So I tried to acquire that wall..."
"Why not the coin? What is it about the wall that attracts you?"
"Well, in training, I always went to the... oh...huh, never thought about it that way."

bierrollerful

A coin isn't a coin unless it occurs at the edge of the map! We may think the AI is weird for ignoring the heretical middle-of-the-map coin, but that's just our object recognition biases showing.

proskub

Let alone simple AI, _people_ get misaligned like that quite often - hoarding is one good example, which happens both in real life and in games like with those keys.

SummerSong

9:00 "We developed interpretability tools to see why programs fail!" "What's going on when they fail?" "Dunno."

No shade, interpretability is hard, even for simple AI :P

charliesteiner

Somehow the terminal and instrumental goals talk made me correlate the AI with us.
As a financial advisor, I have found that many people also made this mistake that money is an instrumental goal, but having spend so much time working to get money, people start to think that money is their terminal goal so much so that they spend their entire live looking for money forgetting why they want to have the money in the first place.

goonerOZZ

Nothing more terrifying than seeing the title 'We Were Right!' on a Robert Miles video.

bartman

Looking at my hoard of keypicks in skyrim, i can confirm, that this is perfectly human behavior.

moartems

I feel like this isn't just a problem with artificial intelligence but intelligence in general. Biological intelligence seems to mismatch terminal goals and instrumental goals all the time like Pavlovian conditioning training a dog to salivate when recognizing a bell ringing(what should be the instrumental goal) or humans trading away happiness and well being (what should be the terminal goal) for money (what should be an instrumental goal).

RichardEntzminger

This is starting to get an "unsolvable problem" vibe. Like we are somehow thinking about this in the wrong way and current solutions aren't really making good progress.

Practicality

imagine a future where a very trusted ai agent seems to be fantastically doing its job well for many months or years, and then suddenly goes haywire since its objective was wrong but it just hadn't encountered a circumstance were that error was made apparent. then tragedy!

-

Can't wait for the "We Were Right! Real Misaligned General Superintelligence" video

Turtlerus

"Can you spot the difference?"
Pauses the video and looking for the difference....nothing. Unpause.
"You can pause the video."
Pauses again and manically looking for a pattern. More keys?
"There's more keys in the deployment. Have you spotted it?"
Yes!!!!

YuureiInu

Imagine training a self driving car in a simulation where plastic bags are always gray and children always wear blue. It then happily runs down a child wearing gray, before slamming on the brakes and throwing the unbuckled passengers through the windshield, for a blue bag on the road.

Houshalter

The thought of creating a capable agent with the wrong goals is terrifying, actually; and yes, an agent being bad at doing something good is absolutely a problem much preferable to an agent being good at doing something bad.

thecakeredux

This is one of your clearest and most interesting videos to date. I'm now very excited for the interpretability video!

andrewweirny

We Were Right! Real Inner Misalignment

We Were Right! Real Inner Misalignment

'We were right' - How to use o1-preview and o1-mini REASONING models

'WE'RE IN BIG TROUBLE' - Elon Musk's BRUTALLY Honest Interview

WE COOKED A MEAL INSIDE OF WALMART

Disney BUSTED Cutting WOKE From Smash Hit Inside Out 2 PROVING WE ARE RIGHT! Get Woke Go Broke Real!

We Found A Casino's Safe With $50,000 Inside Magnet Fishing (Returned To Owner)

'We Are a People Who Feed and Live on Revenge' | Inside GLORY 95 Fight Week

If Emotions Rules Me! We Build a Tiny House for Inside Out 2! Inside Out 2 in Real Life!

we are inside the stadium!! #shorts

WE HAVE RIGHTS: Inside Our Homes (English)

We Are Inside *AREA 51*

🤯 EXCLUSIVE Spiritual BOMBSHELL! Matt Kahn REVEALS the TRUE SOURCE of His MYSTICAL Teachings! 💞...

We Build a Tiny House for Inside Out 2! If Emotions Rules Me! Inside Out 2 in Real Life!

Why Scientists Think We Might Live Inside a Black Hole

Are We Living Inside A Enormous Black Hole 🕳️? 🤯 w/Physicist James Beacham (@TheRoyalInstitution)...

We are inside of a social, economic & financial reset. #economy #thegreatreset

We Got Trapped Inside a Tornado #tornado #stormchasing

We Build a Tiny House for Inside Out 2! Emotions Rules My Life

How We are Working? Merehead Inside.

Michio Kaku: We FINALLY Found What's Inside A Black Hole!

We Are Already Inside the Singularity! #scienceexploration #facts

We are tackling chewy on the inside, every so slightly crisp on the outside oatmeal chocolate chip

Inside a £4,750,000 Modern Apartment with Stunning Coastal Views

At least 1 fig we eat has a 💀 female wasp inside. Find out why with me🌱 #fig #howtowithjessie

We Are Inside AREA 51