2019 09 19 Stuart Armstrong Research Agenda Online Talk
Показать описание
Humans have a (roughly) shared theory of mind that allows them to model the preferences of other humans from their behaviour. Getting this theory of mind into other agents is highly non-trivial.
Stuart Armstrong shows how this is a consequence of the No Free Lunch result in value learning (you cannot deduce the preferences of a potentially irrational agent by observing its behaviour; and simplicity doesn't help), and sketches out his research agenda for learning human preferences despite this impossibility result.
In short, humans have a (roughly) shared theory of mind that allows them to model the preferences of other humans from their behaviour. Getting this theory of mind into other agents is highly non-trivial.
Stuart Armstrong shows how this is a consequence of the No Free Lunch result in value learning (you cannot deduce the preferences of a potentially irrational agent by observing its behaviour; and simplicity doesn't help), and sketches out his research agenda for learning human preferences despite this impossibility result.
In short, humans have a (roughly) shared theory of mind that allows them to model the preferences of other humans from their behaviour. Getting this theory of mind into other agents is highly non-trivial.