Reward Is Enough (Machine Learning Research Paper Explained)

preview_player
Показать описание
#reinforcementlearning #deepmind #agi

What's the most promising path to creating Artificial General Intelligence (AGI)? This paper makes the bold claim that a learning agent maximizing its reward in a sufficiently complex environment will necessarily develop intelligence as a by-product, and that Reward Maximization is the best way to move the creation of AGI forward. The paper is a mix of philosophy, engineering, and futurism, and raises many points of discussion.

OUTLINE:
0:00 - Intro & Outline
4:10 - Reward Maximization
10:10 - The Reward-is-Enough Hypothesis
13:15 - Abilities associated with intelligence
16:40 - My Criticism
26:15 - Reward Maximization through Reinforcement Learning
31:30 - Discussion, Conclusion & My Comments

Abstract:
In this article we hypothesise that intelligence, and its associated abilities, can be understood as subserving the maximisation of reward. Accordingly, reward is enough to drive behaviour that exhibits abilities studied in natural and artificial intelligence, including knowledge, learning, perception, social intelligence, language, generalisation and imitation. This is in contrast to the view that specialised problem formulations are needed for each ability, based on other signals or objectives. Furthermore, we suggest that agents that learn through trial and error experience to maximise reward could learn behaviour that exhibits most if not all of these abilities, and therefore that powerful reinforcement learning agents could constitute a solution to artificial general intelligence.

Authors: David Silver, Satinder Singh, Doina Precup, Richard S. Sutton

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Рекомендации по теме
Комментарии
Автор

OUTLINE:
0:00 - Intro & Outline
4:10 - Reward Maximization
10:10 - The Reward-is-Enough Hypothesis
13:15 - Abilities associated with intelligence
16:40 - My Criticism
26:15 - Reward Maximization through Reinforcement Learning
31:30 - Discussion, Conclusion & My Comments

YannicKilcher
Автор

Infinite time to brute-force a solution is all you need.

altvali
Автор

the paper called "Where does value come from" by Juechems and Summerfield (2019) has a really good discussion on the reward paradox which i think also speaks to the limitation of this hypothesis

lucyzhang
Автор

It's kinda sad to see that some of the foundations of this paper aren't well-established (like a testable definition of intelligence). Especially since the authors are prominent researchers in the area of RL, they should know better and be systematically developing the necessary foundations as part of the paper if they want to make broad claims about an "intelligence hypothesis".

jacobheglund
Автор

Plot twist: ML paper claims evolution is intelligent design.

jomohogames
Автор

Next paper: Splitting is all you need (aka revert to bacteria)

dandan-gfjk
Автор

This entire paper seems like a major hand waving exercise.

soccerplayer
Автор

Evolution is all you need. Darwin called, he wants his paper back.

segelmark
Автор

One cool loss function for intelligence would be trying to surprise other agents in the simulation. Agents can influence their environment (eg. throw rocks, sing, build structures). They are also tasked with predicting what will happen in the next frames. They are rewarded for surprising others and correctly predicting the future. So each agent will have two networks; one to predict the future and another to suggest actions. These networks can share weights on base levels to encourage learning.

I think this encourages intelligence more directly than most rewards (eg. collect rocks), which may collapse. Indeed IRL's reward function of replicate yourself can be fairly unstable. If foxes are too good and eat all the rabbits then the foxes can go extinct. Sometimes a locally good strategy leads to collapse of the system. In the example of rewarding agents for collecting rocks. Maybe one agent learns to kill other agents to steal their rocks. Becomes more powerful and takes all the rocks. Now what?

Surprising other agents (and not being surprised) seems more stable. More social.

andrewcutler
Автор

I'm surprised to see such a speculative paper made it through the peer review.
Especially the fact that the authors didn't provide the definition of intelligence that they are building their argument on top of.
Their hypothesis is definitely plausible but only if they defined intelligence through behavior, where it is a function that takes a state of the environment and outputs certain behavior.
If we define intelligence through consciousness (and it seems to be the stumbling block of the arguments like this), then we are dealing with black boxes and metaphysics.
Since when have science standards stooped so low? 🤔
Or wait... did they indirectly define intelligence as whatever quality a reward-based system develops over time? 😄

kirillnovik
Автор

Someone should have stopped by the philosophy or even the economics department and collected some feedback. Smart people working outside their field can forget that people have been considering issues for decades/centuries.

GreenManorite
Автор

I think reward played a huge role in evolution but some kind of randomness also played a huge role. The fact that bacteria and humans evolved in different ways, where the former didn't get intelligence and the latter become intelligence, despite both of them pursuing the same reward, shows me that some randomness played a role here.

tae
Автор

I think for evolution "to exist more" (by living longer or replication) is the reward which itself becomes a self fulfilling property to justify it's existence.

prithvirajgawande
Автор

According to "Reward is Enough", if you give a dog reward, it can keeps getting smart and surpasses all intelligent agents.

zhangcx
Автор

Thanks again for doing this. I'm learning, not only some new research, but also, how to break down a paper.

alabrrmrbmmr
Автор

The intelligence and learning dilemma, the chicken and the egg problem, can't we just consider it a matter of initialization? If it is a recursive process, then any agent that learns will become intelligent and in doing so improve it's learning process by figuring out how to learn better. Just a thought.

MrAms
Автор

Loved your explanation. We can strongly assert that evolution is not driven by reward maximization. It is mainly due to mutations, which are random. We can perhaps formulate a new and moderated question along the lines, given that an agent has prior intelligence and a set of gear to interact with the environment (say, evolution has endowed the agent with both these things), is reward enough to fine tune to environment perceived by the agent? We can use pre-trained transformers to kind of set up this problem and dissect the inner layers of the transformers to gain better understanding of the above question....

harshavardhanatg
Автор

It's like non-essentialism. Basically we don't need to define what exactly intelligence is. Just optimize the reward and we''ll get there. And then we call what we got 'Intelligence'.

xiaxu
Автор

0:00 Oh David Silver! He courses on RL are amazing. Thats where I learnt all my RL from.
.
24:30 From an evolutionary perspective, there is a lot of random chance involved. There are many evolutionary branches where the environment changed randomly which forced the creatures to adapt and become stronger. When the environment became normal again, they dominated. Bacteria and Humans are equally evolved, but they are adapted to a different environment. We cannot say that the environment for bacteria and humans are the same because although they are same right now, there were times where things diverged locally.
.
27:10 Eh?
.
I think the whole paper is a really complicated way of saying, "If it does what we wanted it to do, then its intelligent enough. Good night."
.
Not a big fan of idea papers. If you want to share your ideas, then make a blog post or youtube videos.

herp_derpingson
Автор

Hi Yannic, I am an AI from the future, projecting this comment into the past ; ) I wonder if "Reward + specific environmental conditions is Enough" would be a more accurate hypothesis. As you pointed out, there is a niche where bacteria are the optimal solution but there is a niche where human intelligence is the optimal solution. Perhaps that is captured simply by which environment the agent wishes to thrive in. Environment in this case would have to include scale (i.e. human and bacteria may exist in the same place at the same time but the environment that each faces is drastically different due to differences in scale).

oreganorx