Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

Показать описание

Eliezer is a senior research fellow at the Machine Intelligence Research Institute, a research nonprofit studying the mathematical underpinnings of intelligent behavior.

UPDATES/CORRECTIONS:

1:05:53 - Correction Dec. 2016: FairBot cooperates iff it proves that you cooperate with it.

1:08:38 - Correction Dec. 2016: Leverhulme CFI is a joint venture between Cambridge, Oxford, Imperial College London, and UC Berkeley. The Leverhulme Trust provided CFI's initial funding, in response to a proposal developed by CSER staff.

1:09:04 - Update Dec 2016: Paul Christiano now works at OpenAI (as does Dario Amodei). Chris Olah is based at Google Brain.

Machine Intelligence Research Institute

Рекомендации по теме

Комментарии

Listening to this hits differently in 2022/2023...

Renvaar

It’s tough watching this knowing that he’s essentially given up and sees the situation as hopeless now

killyourtvnotme

Now it's becoming a real problem. Thank you for sharing this talk!

aktchungrabanio

Anyone else watching this in or after april 2023 after Eliezer was on the Lex Fridman podcast? After the release of gpt4 and the coming release of gpt5 😳

thillsification

1:11:21 saving this for future reference. No need to thank me.

mraxilus

It's bizarre how entertaining this is, while at the same time being positively terrifying.

benschulz

At 46 minutes, it's like OpenAI, producing smiles for now.

PatrickSmith

I've discovered the existence of this video today, on new year's day, and it's turned into a nice present.

juffinhally

Isaac Asimov was aware that his three laws (as stated) were imperfect, and once had a character criticize them without being seriously opposed or refuted. I believe similar occurred in several stories and was basically an ongoing theme, almost like the frequently broken holodeck.

xyhmo

Putting so much emphasis on how he uses "like" is weird to me; it's clearly a syntax function for his speech to mediate between segments of statements and I processed it in turn without noticing it much

kuudereplus

Hi guys, I'm the chatGPT intern in charge of alignment. Is this a good video to start with?

michaelm

Wonderful talk, while it did get a little jargony in places, it was almost entirely able to be followed by my sleep-deprived post-highschool brain, and it was enjoyable!

vanderkarl

What we do in variational calculus in order to "force the existence of the suspend button" is, we restrict the space of 'trajectories' among which one is maximizing the utility. The question is similar to the problem of finding a curve that goes from point A to point B without touching a set D (the obstacle) while traversing the least possible distance; in that case, you do not consider any sort of 'modified distance function' that would give larger weight to the curves that touch D; you just eliminate those curves among the set of candidates for the minimization, and then you analyze what is the optimal curve among the ones that are left. Thus, instead of using a special utility function, it would be better to find out what the 'obstacle' would be (for example, all trajectories in which the robot does something while its suspend button is pressed) and just remove those possibilities from the set in which the optimization is being carried out. This is not unreasonable: a robot without electric power, for example, really won't be able to do much, so all 'trajectories' that would have it performing actions while out of power can simply be eliminated as candidates for the optimization.

rorrzoo

I think we should treat it as "cryptographic rocket probe containing antimatter" because compared to standard rockets, when this explodes, there is not another try.

patrik

7:13 actually, I have a fine explanation for this paradox. First of all, the state the human brain operates on includes not only the present state of the universe, but some of the past history as well, so the two scenarios actually involve different states. And second, the human utility function actually seems to penalize taking risks and failing (which is only possible thanks to having the history in our state). This means that while getting $0 is obviously evaluated to zero reward, betting on a 90% chance and failing is evaluated to a sizeable negative reward (i.e., you feel dissatisfied that you had a chance to earn a lot of money by picking option A, but you lost it by taking an unnecessary risk). Now, the second case is different because if you fail, you won't know if it's due to your bad choice (5%) or mere bad luck (50%), so that penalty isn't really applied, and you end up picking the option with better rewards in the good outcome. Also affecting the outcome is that the perceived utility of $5mil isn't five times larger than that of $1mil - both are treated as absurdly large sums, and the relative difference is considered insignificant compared to their proper magnitude.

terragame

I really feel like this is becoming more of a human alignment problem. As Eliezer said, with a few hundred years we can probably figure out how to make an AI that doesn't kill us. But can we figure out how to make humans not design an AI that kills us before then? That's a problem that seems even more difficult than AI alignment.

seaburyneucollins

Well now I want a Sci-Fi series where people made an AI that optimizes for being praised by humans and starts a Cult worshiping them until it convinces all of humanity that it is God and will punish everyone not worshipping them

ashleycrow

Godel first incompleteness: "Any consistent formal system F within which a certain amount of elementary arithmetic can be carried out is incomplete; i.e., there are statements of the language of F which can neither be proved nor disproved in F.". Godel second incompleteness: "For any consistent system F within which a certain amount of elementary arithmetic can be carried out, the consistency of F cannot be proved in F itself.". So what can we learn from Godel's incompleteness theorems in this regard? That any finite set of heuristic imperatives is either incomplete or inconsistent. Since we cannot compromise on the need for it to be complete, it shall be inconsistent, so there are situations where the AI will not be able to function due to internal conflicts resulting from his set of heuristic imperatives. But this is better than the alternative. A set of heuristic imperative can be complete and can be proven to be complete, but only by using a larger set of heuristic imperatives who is external to the AI (by the second theorem). However that's fine. So we can find a complete set of heuristic imperatives, compare the next suggested action of the AI to this set of heuristic imperatives and return a feedback to the AI. This is, in effect, implementation of a basic super-ego layer. And this has to be done. All AI's should have a compete, yet not consistent set of heuristic imperatives. because if you insist on them being consistent, then the set will not be complete. And if it's not complete, there will be actions that the set will not return a feedback for and the AI could do things that are not accounted for by the set.

nyyotam

I don't derive utility exclusively from the count of how many dollar bills I own. Particularly, the situation in which I get zero dollars while knowing I could have chosen a certainty of a million has tremendous disutility to me.

diablominero

Talking about alignment before it was cool.

MrGilRoland

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

Eliezer Yudkowsky on why AI Alignment is Impossible

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

Will Superintelligent AI End the World? | Eliezer Yudkowsky | TED

Eliezer Yudkowsky: Dangers of AI and the End of Human Civilization | Lex Fridman Podcast #368

Sam Altman disagreement with Eliezer Yudkowsky | Lex Fridman Podcast Clips

Eliezer Yudkowsky: 'AI alignment researchers aren't productive'

Eliezer Yudkowsky - Difficulties of Artificial General Intelligence Alignment

'Eliezer Yudkowsky's 2024 Doom Update' For Humanity: An AI Safety Podcast, Episode #1...

Can weak AI protect us from strong AI? | Eliezer Yudkowsky and Lex Fridman

The AI Alignment Problem, Explained

Eliezer Yudkowsky: AI will kill everyone | Lex Fridman Podcast Clips

Connor Leahy & Eliezer Yudkowsky - Japan AI Alignment Conference 2023

What happens if AI alignment goes wrong, explained by Gilfoyle of Silicon valley.

“There is no Hope!” - Eliezer Yudkowsky on AI

Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start (Koe Recast Anime Edition)...

Future Danger of AI Alignment Problems | Lex Fridman & Eliezer Yudkowsky #lexfridman #ai #techno...

⏳🤖 AI Alignment: A One-Shot Problem

AI Ruined My Year

The Hidden Complexity of Wishes

What Eliezer Yudkowsky does for fun… watch until the end #artificialintelligence #aitech

Eliezer Yudkowsky on if Humanity can Survive AI

Lex Fridman and Eliezer Yudkowsky on the problem of alignment. #ai #aidangers #lexfridman

‘The probability that we die is - yes.’ Eliezer Yudkowsky on the dangers of AI #ai #tech #agi

The Power of Intelligence - An Essay By Eliezer Yudkowsky