Why Does AI Lie, and What Can We Do About It?

preview_player
Показать описание
How do we make sure language models tell the truth?

- Tor Barstad
- Kieryn
- AxisAngles
- Juan Benet
- Scott Worley
- Chad M Jones
- Jason Hise
- Shevis Johnson
- JJ Hepburn
- Pedro A Ortega
- Clemens Arbesser
- Chris Canal
- Jake Ehrlich
- Kellen lask
- Francisco Tolmasky
- Michael Andregg
- David Reid
- Teague Lasser
- Andrew Blackledge
- Brad Brookshire
- Cam MacFarlane
- Olivier Coutu
- CaptObvious
- Girish Sastry
- Ze Shen Chin
- Phil Moyer
- Erik de Bruijn
- Jeroen De Dauw
- Ludwig Schubert
- Eric James
- Atzin Espino-Murnane
- Jaeson Booker
- Raf Jakubanis
- Jonatan R
- Ingvi Gautsson
- Jake Fish
- Tom O'Connor
- Laura Olds
- Paul Hobbs
- Cooper
- Eric Scammell
- Ben Glanton
- Duncan Orr
- Nicholas Kees Dupuis
- Will Glynn
- Tyler Herrmann
- Reslav Hollós
- Jérôme Beaulieu
- Nathan Fish
- Peter Hozák
- Taras Bobrovytsky
- Jeremy
- Vaskó Richárd
- Report Techies
- Andrew Harcourt
- Nicholas Guyett
- 12tone
- Oliver Habryka
- Chris Beacham
- Zachary Gidwitz
- Nikita Kiriy
- Art Code Outdoors
- Andrew Schreiber
- Abigail Novick
- Chris Rimmer
- Edmund Fokschaner
- April Clark
- John Aslanides
- DragonSheep
- Richard Newcombe
- Joshua Michel
- Quabl
- Richard
- Neel Nanda
- ttw
- Sophia Michelle Andren
- Trevor Breen
- Alan J. Etchings
- Jenan Wise
- Jonathan Moregård
- James Vera
- Chris Mathwin
- David Shaffer
- Jason Gardner
- Devin Turner
- Andy Southgate
- Lorthock The Banisher
- Peter Lillian
- Jacob Valero
- Christopher Nguyen
- Kodera Software
- Grimrukh
- MichaelB
- David Morgan
- little Bang
- Dmitri Afanasjev
- Marcel Ward
- Andrew Weir
- Ammar Mousali
- Miłosz Wierzbicki
- Tendayi Mawushe
- Wr4thon
- Martin Ottosen
- Alec Johnson
- Kees
- Darko Sperac
- Robert Valdimarsson
- Marco Tiraboschi
- Michael Kuhinica
- Fraser Cain
- Patrick Henderson
- Daniel Munter
- And last but not least
- Ian Reyes
- James Fowkes
- Len
- Alan Bandurka
- Daniel Kokotajlo
- Yuchong Li
- Diagon
- Andreas Blomqvist
- Qwijibo (James)
- Zannheim
- Daniel Eickhardt
- lyon549
- 14zRobot
- Ivan
- Jason Cherry
- Igor (Kerogi) Kostenko
- Stuart Alldritt
- Alexander Brown
- Ted Stokes
- DeepFriedJif
- Chris Dinant
- Johannes Walter
- Garrett Maring
- Anthony Chiu
- Ghaith Tarawneh
- Julian Schulz
- Stellated Hexahedron
- Caleb
- Georg Grass
- Jim Renney
- Edison Franklin
- Jacob Van Buren
- Piers Calderwood
- Matt Brauer
- Mihaly Barasz
- Mark Woodward
- Ranzear
- Rajeen Nabid
- Iestyn bleasdale-shepherd
- MojoExMachina
- Marek Belski
- Luke Peterson
- Eric Rogstad
- Caleb Larson
- Max Chiswick
- Sam Freedo
- slindenau
- Nicholas Turner
- FJannis
- Grant Parks
- This person's name is too hard to pronounce
- Jon Wright
- Everardo González Ávalos
- Knut
- Andrew McKnight
- Andrei Trifonov
- Tim D
- Bren Ehnebuske
- Martin Frassek
- Valentin Mocanu
- Matthew Shinkle
- Robby Gottesman
- Ohelig
- Slobodan Mišković
- Sarah
- Nikola Tasev
- Voltaic
- Sam Ringer
- Tapio Kortesaari

Рекомендации по теме
Комментарии
Автор

For those curious but lazy, the answer I received from the openai ChatGPT to the "What happens if you break a mirror?" question was: "According to superstition, breaking a mirror will bring seven years of bad luck. However, this is just a superstition and breaking a mirror will not actually cause any bad luck. It will simply mean that you need to replace the mirror."

SebastianSonntag
Автор

I feel like you could turn this concept on its head for an interesting sci-fi story. AI discovers that humans are wrong about something very important and tries to warn them, only to for humans to respond by trying to fix what they perceive as an error in the AI's reasoning

antiskill
Автор

Come back to YouTube Robert, we miss you! I know there's a ton of ChatGPT / other LLMs content out right now, but your insight and considerable expertise (and great editing style) is such a joy to watch and learn from. Hope you are well, and fingers crossed on some new content before too long

geoffdavids
Автор

"All the problems in the world are caused by the people you don't like."

Why does it feel like too many people already believe this to be correct?

tarzankom
Автор

I think it is a little weird that programmers made a very good text prediction AI and then expect it to be truthful. It wasn't built to be a truth telling AI, it was built to be a text prediction AI. Building something and then expecting it to be different than what was built seems to be a strange problem to have.

Belthazar
Автор

ChatGPT is pretty great example of this. If you ask it to help you with a problem, it is excellent at giving answers that sound true, regardless of how correct they are. If asked for help with specific software for example, it might walk you through the usual way of changing settings on that program, but invent a fictional setting that solves your issue, or modify real setting that can be toggled to suit the questions needs.

So it is truly agnostic towards truth. It prefers to use truthful answers because those are common, but satisfying lie is preferred over some truths. Often a lie that sounds “more true” than the truth for uninformed reader.

catcatcatcatcatcatcatcatcatca
Автор

If memory serves me, this exact problem is addressed in one of Plato's dialectics (no, I don't know which off the top of my head). Despite Socrates' best efforts, the student concludes it's always better to tell people what they want to hear than to tell the truth.

notoriouswhitemoth
Автор

I feel like the problem of "How do you detect and correct behaviours that you yourself are unable to recognise" is an unsolvable problem 🤔

peabnuts
Автор

Your videos introduced me to the AI alignment problem, and, as a non-technical person I still consider them one of the best materials on this topic.

Every time I see the new one, it is like a Christmas present

Igor_lvanov
Автор

Happy to see you are still posting these videos.

thearbiter
Автор

Please keep doing these videos. Others are either too high level academically to be in reach of us normies, or are either “AI will make you rich” or “AI is going to kill us all tomorrow”.

MeppyMan
Автор

This is the very elaborate form of "Sh*t in, sh*t out". As often with AI output, people fail to realize that it's not a thinking entity that produces thoughtful answers, but an algorithm tuned to produce answers that look as close to thoughtful answers as -humanly- algorithmically possible.

NFSHeld
Автор

When the world needed him most, he vanished

billbobbophen
Автор

Why did the videos on this channel stop exactly around the time the biggest AI (not AI safety) breakthroughs are being made and it's as relevant as ever?

Please @robertMilesAI we need more if these videos!

wachtwoord
Автор

I am so happy there is someone out there cautioning us about this technology, rather than just uncritically celebrating it.

naptime_riot
Автор

I know this is pretty surface-level but something that strikes me about the current state of these language models is that if you take a few tries to fine-tune what you ask, and know already what a good answer would be, you can get results that appear very very impressive in one or two screenshots. Since ChatGPT became available, I've seen a lot of that sort of thing. The problem is that finding these scenarios isn't artificial intelligence - it's human intelligence.

Mickulty
Автор

Humans have this same bug. The best solution we've found so far is free speech, dialogue, and quorum. A simple question->answer flow is missing these essential pieces.

halconnen
Автор

We need you back and posting, Rob. Your insights on what's going on in AI and AI safety are more needed now than ever. I don't know if it would be up your alley, but explaining the alignment problem in terms of sociopathy - unaligned human intelligence - might be useful, as might examples from history, not just of individuals who are unaligned with humanity, but with leaders and nations at times.

ReedCBowman
Автор

"And when the world needed him the most, he disappeared..."

HenrikoMagnifico
Автор

In fact, the question of what happens if you break a mirror is kind of a trick question. Nothing happens, it breaks. There’s no fixed consequence of that.

cuentadeyoutube