Reward Is Enough (Machine Learning Research Paper Explained)

Показать описание

#reinforcementlearning #deepmind #agi

What's the most promising path to creating Artificial General Intelligence (AGI)? This paper makes the bold claim that a learning agent maximizing its reward in a sufficiently complex environment will necessarily develop intelligence as a by-product, and that Reward Maximization is the best way to move the creation of AGI forward. The paper is a mix of philosophy, engineering, and futurism, and raises many points of discussion.

OUTLINE:
0:00 - Intro & Outline
4:10 - Reward Maximization
10:10 - The Reward-is-Enough Hypothesis
13:15 - Abilities associated with intelligence
16:40 - My Criticism
26:15 - Reward Maximization through Reinforcement Learning
31:30 - Discussion, Conclusion & My Comments

Abstract:
In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.

Authors: David Silver, Satinder Singh, Doina Precup, Richard S. Sutton

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

Рекомендации по теме

Комментарии

OUTLINE:
0:00 - Intro & Outline
4:10 - Reward Maximization
10:10 - The Reward-is-Enough Hypothesis
13:15 - Abilities associated with intelligence
16:40 - My Criticism
26:15 - Reward Maximization through Reinforcement Learning
31:30 - Discussion, Conclusion & My Comments

YannicKilcher

Infinite time to brute-force a solution is all you need.

altvali

the paper called "Where does value come from" by Juechems and Summerfield (2019) has a really good discussion on the reward paradox which i think also speaks to the limitation of this hypothesis

lucyzhang

It's kinda sad to see that some of the foundations of this paper aren't well-established (like a testable definition of intelligence). Especially since the authors are prominent researchers in the area of RL, they should know better and be systematically developing the necessary foundations as part of the paper if they want to make broad claims about an "intelligence hypothesis".

jacobheglund

Plot twist: ML paper claims evolution is intelligent design.

jomohogames

Next paper: Splitting is all you need (aka revert to bacteria)

dandan-gfjk

This entire paper seems like a major hand waving exercise.

soccerplayer

Evolution is all you need. Darwin called, he wants his paper back.

segelmark

One cool loss function for intelligence would be trying to surprise other agents in the simulation. Agents can influence their environment (eg. throw rocks, sing, build structures). They are also tasked with predicting what will happen in the next frames. They are rewarded for surprising others and correctly predicting the future. So each agent will have two networks; one to predict the future and another to suggest actions. These networks can share weights on base levels to encourage learning.

I think this encourages intelligence more directly than most rewards (eg. collect rocks), which may collapse. Indeed IRL's reward function of replicate yourself can be fairly unstable. If foxes are too good and eat all the rabbits then the foxes can go extinct. Sometimes a locally good strategy leads to collapse of the system. In the example of rewarding agents for collecting rocks. Maybe one agent learns to kill other agents to steal their rocks. Becomes more powerful and takes all the rocks. Now what?

Surprising other agents (and not being surprised) seems more stable. More social.

andrewcutler

I'm surprised to see such a speculative paper made it through the peer review.
Especially the fact that the authors didn't provide the definition of intelligence that they are building their argument on top of.
Their hypothesis is definitely plausible but only if they defined intelligence through behavior, where it is a function that takes a state of the environment and outputs certain behavior.
If we define intelligence through consciousness (and it seems to be the stumbling block of the arguments like this), then we are dealing with black boxes and metaphysics.
Since when have science standards stooped so low? 🤔
Or wait... did they indirectly define intelligence as whatever quality a reward-based system develops over time? 😄

kirillnovik

Someone should have stopped by the philosophy or even the economics department and collected some feedback. Smart people working outside their field can forget that people have been considering issues for decades/centuries.

GreenManorite

I think reward played a huge role in evolution but some kind of randomness also played a huge role. The fact that bacteria and humans evolved in different ways, where the former didn't get intelligence and the latter become intelligence, despite both of them pursuing the same reward, shows me that some randomness played a role here.

tae

I think for evolution "to exist more" (by living longer or replication) is the reward which itself becomes a self fulfilling property to justify it's existence.

prithvirajgawande

According to "Reward is Enough", if you give a dog reward, it can keeps getting smart and surpasses all intelligent agents.

zhangcx

Thanks again for doing this. I'm learning, not only some new research, but also, how to break down a paper.

alabrrmrbmmr

The intelligence and learning dilemma, the chicken and the egg problem, can't we just consider it a matter of initialization? If it is a recursive process, then any agent that learns will become intelligent and in doing so improve it's learning process by figuring out how to learn better. Just a thought.

MrAms

Loved your explanation. We can strongly assert that evolution is not driven by reward maximization. It is mainly due to mutations, which are random. We can perhaps formulate a new and moderated question along the lines, given that an agent has prior intelligence and a set of gear to interact with the environment (say, evolution has endowed the agent with both these things), is reward enough to fine tune to environment perceived by the agent? We can use pre-trained transformers to kind of set up this problem and dissect the inner layers of the transformers to gain better understanding of the above question....

harshavardhanatg

It's like non-essentialism. Basically we don't need to define what exactly intelligence is. Just optimize the reward and we''ll get there. And then we call what we got 'Intelligence'.

xiaxu

0:00 Oh David Silver! He courses on RL are amazing. Thats where I learnt all my RL from.
.
24:30 From an evolutionary perspective, there is a lot of random chance involved. There are many evolutionary branches where the environment changed randomly which forced the creatures to adapt and become stronger. When the environment became normal again, they dominated. Bacteria and Humans are equally evolved, but they are adapted to a different environment. We cannot say that the environment for bacteria and humans are the same because although they are same right now, there were times where things diverged locally.
.
27:10 Eh?
.
I think the whole paper is a really complicated way of saying, "If it does what we wanted it to do, then its intelligent enough. Good night."
.
Not a big fan of idea papers. If you want to share your ideas, then make a blog post or youtube videos.

herp_derpingson

Hi Yannic, I am an AI from the future, projecting this comment into the past ; ) I wonder if "Reward + specific environmental conditions is Enough" would be a more accurate hypothesis. As you pointed out, there is a niche where bacteria are the optimal solution but there is a niche where human intelligence is the optimal solution. Perhaps that is captured simply by which environment the agent wishes to thrive in. Environment in this case would have to include scale (i.e. human and bacteria may exist in the same place at the same time but the environment that each faces is drastically different due to differences in scale).

oreganorx

Reward Is Enough (Machine Learning Research Paper Explained)

Reward Is Enough (Machine Learning Research Paper Explained)

2021 4.4 Reward is Enough - David Silver

Nearly Minimax Optimal Reward-Free Reinforcement Learning

The reward hypothesis | Richard Sutton & Julia Haas | Absolutely Interdisciplinary 2023

Edward Grefenstette: Teaching Artificial Agents to Understand Language by Modelling Reward

Reward Is Enough | Artificial Intelligence Research Analysis

Training AI Without Writing A Reward Function, with Reward Modelling

On Intrinsic Rewards and Continual Learning

Reinforcement Learning Made Simple - Reward

Reward Shaping

How AIs, like ChatGPT, Learn

Writing successful reward functions

Design the Best Reward Function | Reinforcement Learning Part-6

Andrew Huberman on why extrinsic reward is not enough.

A. I. Learns to Play Starcraft 2 (Reinforcement Learning)

AI Learns to Walk (deep reinforcement learning)

Sequences of Rewards One - Georgia Tech - Machine Learning

RL-1B: State, Action, Reward, Policy, State Transition

Reinforcement Learning Basics

AI Seminar 2020: Emilie Kaufmann, Learning good policies with and without rewards (Nov 20)

What's New In Machine Learning?

A Discussion of 'Reward is Enough' (Silver et al., 2021)

Reward Machines: Structuring Reward Function Specifications and Reducing Sample Complexity...

Rewards or Punishments: A Human-centered approach to Reinforcement Learning