Eliezer Yudkowsky – AI Alignment: Why It's Hard, and Where to Start

preview_player
Показать описание

Eliezer is a senior research fellow at the Machine Intelligence Research Institute, a research nonprofit studying the mathematical underpinnings of intelligent behavior.

UPDATES/CORRECTIONS:

1:05:53 - Correction Dec. 2016: FairBot cooperates iff it proves that you cooperate with it.

1:08:38 - Correction Dec. 2016: Leverhulme CFI is a joint venture between Cambridge, Oxford, Imperial College London, and UC Berkeley. The Leverhulme Trust provided CFI's initial funding, in response to a proposal developed by CSER staff.

1:09:04 - Update Dec 2016: Paul Christiano now works at OpenAI (as does Dario Amodei). Chris Olah is based at Google Brain.
Рекомендации по теме
Комментарии
Автор

Listening to this hits differently in 2022/2023...

Renvaar
Автор

It’s tough watching this knowing that he’s essentially given up and sees the situation as hopeless now

killyourtvnotme
Автор

Now it's becoming a real problem. Thank you for sharing this talk!

aktchungrabanio
Автор

Anyone else watching this in or after april 2023 after Eliezer was on the Lex Fridman podcast? After the release of gpt4 and the coming release of gpt5 😳

thillsification
Автор

1:11:21 saving this for future reference. No need to thank me.

mraxilus
Автор

It's bizarre how entertaining this is, while at the same time being positively terrifying.

benschulz
Автор

At 46 minutes, it's like OpenAI, producing smiles for now.

PatrickSmith
Автор

I've discovered the existence of this video today, on new year's day, and it's turned into a nice present.

juffinhally
Автор

Isaac Asimov was aware that his three laws (as stated) were imperfect, and once had a character criticize them without being seriously opposed or refuted. I believe similar occurred in several stories and was basically an ongoing theme, almost like the frequently broken holodeck.

xyhmo
Автор

Putting so much emphasis on how he uses "like" is weird to me; it's clearly a syntax function for his speech to mediate between segments of statements and I processed it in turn without noticing it much

kuudereplus
Автор

Hi guys, I'm the chatGPT intern in charge of alignment. Is this a good video to start with?

michaelm
Автор

Wonderful talk, while it did get a little jargony in places, it was almost entirely able to be followed by my sleep-deprived post-highschool brain, and it was enjoyable!

vanderkarl
Автор

What we do in variational calculus in order to "force the existence of the suspend button" is, we restrict the space of 'trajectories' among which one is maximizing the utility. The question is similar to the problem of finding a curve that goes from point A to point B without touching a set D (the obstacle) while traversing the least possible distance; in that case, you do not consider any sort of 'modified distance function' that would give larger weight to the curves that touch D; you just eliminate those curves among the set of candidates for the minimization, and then you analyze what is the optimal curve among the ones that are left. Thus, instead of using a special utility function, it would be better to find out what the 'obstacle' would be (for example, all trajectories in which the robot does something while its suspend button is pressed) and just remove those possibilities from the set in which the optimization is being carried out. This is not unreasonable: a robot without electric power, for example, really won't be able to do much, so all 'trajectories' that would have it performing actions while out of power can simply be eliminated as candidates for the optimization.

rorrzoo
Автор

I think we should treat it as "cryptographic rocket probe containing antimatter" because compared to standard rockets, when this explodes, there is not another try.

patrik
Автор

7:13 actually, I have a fine explanation for this paradox. First of all, the state the human brain operates on includes not only the present state of the universe, but some of the past history as well, so the two scenarios actually involve different states. And second, the human utility function actually seems to penalize taking risks and failing (which is only possible thanks to having the history in our state). This means that while getting $0 is obviously evaluated to zero reward, betting on a 90% chance and failing is evaluated to a sizeable negative reward (i.e., you feel dissatisfied that you had a chance to earn a lot of money by picking option A, but you lost it by taking an unnecessary risk). Now, the second case is different because if you fail, you won't know if it's due to your bad choice (5%) or mere bad luck (50%), so that penalty isn't really applied, and you end up picking the option with better rewards in the good outcome. Also affecting the outcome is that the perceived utility of $5mil isn't five times larger than that of $1mil - both are treated as absurdly large sums, and the relative difference is considered insignificant compared to their proper magnitude.

terragame
Автор

I really feel like this is becoming more of a human alignment problem. As Eliezer said, with a few hundred years we can probably figure out how to make an AI that doesn't kill us. But can we figure out how to make humans not design an AI that kills us before then? That's a problem that seems even more difficult than AI alignment.

seaburyneucollins
Автор

Well now I want a Sci-Fi series where people made an AI that optimizes for being praised by humans and starts a Cult worshiping them until it convinces all of humanity that it is God and will punish everyone not worshipping them

ashleycrow
Автор

Godel first incompleteness: "Any consistent formal system F within which a certain amount of elementary arithmetic can be carried out is incomplete; i.e., there are statements of the language of F which can neither be proved nor disproved in F.". Godel second incompleteness: "For any consistent system F within which a certain amount of elementary arithmetic can be carried out, the consistency of F cannot be proved in F itself.". So what can we learn from Godel's incompleteness theorems in this regard? That any finite set of heuristic imperatives is either incomplete or inconsistent. Since we cannot compromise on the need for it to be complete, it shall be inconsistent, so there are situations where the AI will not be able to function due to internal conflicts resulting from his set of heuristic imperatives. But this is better than the alternative. A set of heuristic imperative can be complete and can be proven to be complete, but only by using a larger set of heuristic imperatives who is external to the AI (by the second theorem). However that's fine. So we can find a complete set of heuristic imperatives, compare the next suggested action of the AI to this set of heuristic imperatives and return a feedback to the AI. This is, in effect, implementation of a basic super-ego layer. And this has to be done. All AI's should have a compete, yet not consistent set of heuristic imperatives. because if you insist on them being consistent, then the set will not be complete. And if it's not complete, there will be actions that the set will not return a feedback for and the AI could do things that are not accounted for by the set.

nyyotam
Автор

I don't derive utility exclusively from the count of how many dollar bills I own. Particularly, the situation in which I get zero dollars while knowing I could have chosen a certainty of a million has tremendous disutility to me.

diablominero
Автор

Talking about alignment before it was cool.

MrGilRoland