Enhanced POET: Open-Ended RL through Unbounded Invention of Learning Challenges and their Solutions

preview_player
Показать описание
The enhanced POET makes some substantial and well-crafted improvements over the original POET algorithm and excels at open-ended learning like no system before.

Abstract:
Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of limitations of the algorithm itself and because of external issues including a limited problem space and lack of a universal progress measure. Importantly, both limitations pose impediments not only for POET, but for the pursuit of open-endedness in general. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations designed to help elucidate its full potential. Together, these four advances enable the most open-ended algorithmic demonstration to date. The algorithmic innovations are (1) a domain-general measure of how meaningfully novel new challenges are, enabling the system to potentially create and solve interesting challenges endlessly, and (2) an efficient heuristic for determining when agents should goal-switch from one problem to another (helping open-ended search better scale). Outside the algorithm itself, to enable a more definitive demonstration of open-endedness, we introduce (3) a novel, more flexible way to encode environmental challenges, and (4) a generic measure of the extent to which a system continues to exhibit open-ended innovation. Enhanced POET produces a diverse range of sophisticated behaviors that solve a wide range of environmental challenges, many of which cannot be solved through other means.

Authors: Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley

Links:
Рекомендации по теме
Комментарии
Автор

Great video! It would be nice if you could review the paper "On the measure of Intelligence" by Francois Chollet. That would be a neat segway and helpful for researchers in this field.

alibaheri
Автор

The environment novelty metric is interesting. On the surface, it sounds like it should work well. But I feel like maybe it requires a bit more convincing? Both the environments and the agents are created algorithmically. Since the agents are used to judge the environments, it seems plausible that this might end up with an extra generous classification of novelty - either accidentally, or on purpose, as the author tries to optimize their algorithms. The other concern is that the number of possible novel environments also will depend on the total number of agents. Boost the number of agents, and it becomes much easier to generate more "novel" environments. Finally, as the agents are trained, they change - which means that the "novelty" of a past environment can change as well. What do they do with environments that used to be novel, but aren't any more?

Even so, with all that criticism, I can't really think of a better, equally generalizable novelty metric. Most novelty metrics would be constrained to a single problem, and need to be hand-engineered. The fact that you could just slap this metric on any problem using any generation methodology is a big plus. So if this works in practice on all types of problems, that is a big win.

The ANNECS metric depends on the novelty metric above, inheriting it's problems. Also, it's basically impossible to compare any other existing techniques with that metric, so it seems kinda useless right now.


In another matter, I really would like to see how this technique performs on other environment-based reinforcement learning problems. 2D walker problems might be difficult enough to work as a toy problem, but it has no practical use. I want to see if 3D physically based animation, for example, sees improvements from the POET techniques.

jrkirby
Автор

The new environment metric is really interesting! Thanks for sharing :)

maraoz