Enhanced POET: Open-Ended RL through Unbounded Invention of Learning Challenges and their Solutions

Показать описание

The enhanced POET makes some substantial and well-crafted improvements over the original POET algorithm and excels at open-ended learning like no system before.

Abstract:
Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of limitations of the algorithm itself and because of external issues including a limited problem space and lack of a universal progress measure. Importantly, both limitations pose impediments not only for POET, but for the pursuit of open-endedness in general. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations designed to help elucidate its full potential. Together, these four advances enable the most open-ended algorithmic demonstration to date. The algorithmic innovations are (1) a domain-general measure of how meaningfully novel new challenges are, enabling the system to potentially create and solve interesting challenges endlessly, and (2) an efficient heuristic for determining when agents should goal-switch from one problem to another (helping open-ended search better scale). Outside the algorithm itself, to enable a more definitive demonstration of open-endedness, we introduce (3) a novel, more flexible way to encode environmental challenges, and (4) a generic measure of the extent to which a system continues to exhibit open-ended innovation. Enhanced POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved through other means.

Authors: Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley

Links:

Рекомендации по теме

Комментарии

Great video! It would be nice if you could review the paper "On the measure of Intelligence" by Francois Chollet. That would be a neat segway and helpful for researchers in this field.

alibaheri

The environment novelty metric is interesting. On the surface, it sounds like it should work well. But I feel like maybe it requires a bit more convincing? Both the environments and the agents are created algorithmically. Since the agents are used to judge the environments, it seems plausible that this might end up with an extra generous classification of novelty - either accidentally, or on purpose, as the author tries to optimize their algorithms. The other concern is that the number of possible novel environments also will depend on the total number of agents. Boost the number of agents, and it becomes much easier to generate more "novel" environments. Finally, as the agents are trained, they change - which means that the "novelty" of a past environment can change as well. What do they do with environments that used to be novel, but aren't any more?

Even so, with all that criticism, I can't really think of a better, equally generalizable novelty metric. Most novelty metrics would be constrained to a single problem, and need to be hand-engineered. The fact that you could just slap this metric on any problem using any generation methodology is a big plus. So if this works in practice on all types of problems, that is a big win.

The ANNECS metric depends on the novelty metric above, inheriting it's problems. Also, it's basically impossible to compare any other existing techniques with that metric, so it seems kinda useless right now.

In another matter, I really would like to see how this technique performs on other environment-based reinforcement learning problems. 2D walker problems might be difficult enough to work as a toy problem, but it has no practical use. I want to see if 3D physically based animation, for example, sees improvements from the POET techniques.

jrkirby

The new environment metric is really interesting! Thanks for sharing :)

maraoz

Enhanced POET: Open-Ended RL through Unbounded Invention of Learning Challenges and their Solutions

Enhanced POET: Open-Ended RL through Unbounded Invention of Learning Challenges and their Solutions

Enhanced POET: Open-ended Reinforcement Learning through Unbounded Invention of Learning Challenges

Exploring Open-Ended Algorithms: POET

POET: Paired Open-Ended Trailblazer | Paper Explained

POET: Endlessly Generating Increasingly Complex and Diverse Learning Environments and Solutions

The Importance of Open-Endedness in AI and Machine Learning a talk by Kenneth Stanley (OpenAI)

Max Jaderberg - Open-Ended Learning Leads to Generally Capable Agents @ UCL DARK

Jeff Clune, Uber AI Labs - Presenting POET

Open-Ended AI: The Key to Superhuman Intelligence?

Monthly AI in San Francisco #10: Reinforcement learning with open-ended algorithms at Uber

Population-Based Search and Open-Ended Algorithms

Accelerating Intelligence with AI-Generating Algorithms with Jeff Clune - 602

Improving Robot and Deep Reinforcement Learning via Quality Diversity, Open-Ended, and...

Open-Endedness Panel

Improving Deep Reinforcement Learning via Quality Diversity, Open-Ended and AI-Generating Algorithms

Open-ended and AI-generating Algorithms in the Era of Foundation Models: Research Talk by Jeff Clune

Jack Parker-Holder & Minqi Jiang - Open-Ended Learning Leads to Generally Capable Agents

Dr Rui Wang, Uber AI: 'Open-Ended Reinforcement Learning' - i4.0 Connect Forum

Open-Endedness and AI GAs in the Era of Foundation Models

ACCEL: Evolving Curricula with Regret-Based Environment Design (Paper Review)

(Newer, longer version available) Open-Endedness and AI GAs in the Era of Foundation Models

Joel Lehman Promise, Progress, and Challenges in Open-Ended Machine Learning, 21.Oct.2022

#038 - Prof. KENNETH STANLEY - Why Greatness Cannot Be Planned

AGI Seminar #3, Jeff Clune 'Open-Ended and AI-Generating Algorithms in the Era of Foundation Mo...