2019 09 19 Stuart Armstrong Research Agenda Online Talk

preview_player
Показать описание
Humans have a (roughly) shared theory of mind that allows them to model the preferences of other humans from their behaviour. Getting this theory of mind into other agents is highly non-trivial.

Stuart Armstrong shows how this is a consequence of the No Free Lunch result in value learning (you cannot deduce the preferences of a potentially irrational agent by observing its behaviour; and simplicity doesn't help), and sketches out his research agenda for learning human preferences despite this impossibility result.

In short, humans have a (roughly) shared theory of mind that allows them to model the preferences of other humans from their behaviour. Getting this theory of mind into other agents is highly non-trivial.

Рекомендации по теме