Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)

Показать описание

When AI makes a plan it usually does so step by step, forward in time. But often it is beneficial to define intermediate goals to divide a large problem into easier sub-problems. This paper proposes a generalization of MCTS that searches not for the best next actions to take, but for the best way to sub-divide the problem recursively into problems so tiny that they can each be solved in a single step.

Abstract:
Standard planners for sequential decision making (including Monte Carlo planning, tree search, dynamic programming, etc.) are constrained by an implicit sequential planning assumption: The order in which a plan is constructed is the same in which it is executed. We consider alternatives to this assumption for the class of goal-directed Reinforcement Learning (RL) problems. Instead of an environment transition model, we assume an imperfect, goal-directed policy. This low-level policy can be improved by a plan, consisting of an appropriate sequence of sub-goals that guide it from the start to the goal state. We propose a planning algorithm, Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS), for approximating the optimal plan by means of proposing intermediate sub-goals which hierarchically partition the initial tasks into simpler ones that are then solved independently and recursively. The algorithm critically makes use of a learned sub-goal proposal for finding appropriate partitions trees of new tasks based on prior experience. Different strategies for learning sub-goal proposals give rise to different planning strategies that strictly generalize sequential planning. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds as well as in challenging continuous control environments.

Authors: Giambattista Parascandolo, Lars Buesing, Josh Merel, Leonard Hasenclever, John Aslanides, Jessica B. Hamrick, Nicolas Heess, Alexander Neitz, Theophane Weber

Links:

Рекомендации по теме

Комментарии

Thank you so much, you're the best ! May God reward you !

friedrichwaterson

thanks mate, you saved my engineering thesis with that

karolszymczyk

The method seems to rely on the select-heuristic, e.g. it will perform in environments that can be judged 'intuitively'. I wonder how it would perform in hard mazes with deceptive traps (e.g. physical distance becomes a bad indicator for step-distance). I would expect the advantage to go away (guessing power of neural net might be much more limited in such situations). So the interesting question is: How good can a neural net 'guess' a good subdivision state and this might be very dependent on the way the problem is fed into the net etc. (disclaimer: I have not yet read the paper.)

bluelng

This is an interesting technique. The idea of trading off deep tree searches for wide tree searches is an interesting one.

As it stands, I don't see it having much practical significance. It depends on having a deterministic, simulatable environment. It assumes a discrete state space. It requires perfect knowledge of the environment. It needs to cheaply test whether a state is valid or not. It's only been tested on a toy problem with only a couple hundred states. Real problems frequently have more states than the number of atoms in the universe, and this kind of approach would probably just break down.

Despite all that, there still may be some merit to the idea. You might be able to change the training so this still works even with absurdly large state spaces. Perhaps if you did that, you could beat AlphaGo with less training time. But what's the point in beating an AI at a game when the AI already dominates humans?

Often you can reformulate problems to make some the assumptions this needs to be true, or reformulate the algorithm to work if they aren't. This could make this algorithm applicable to some real world practical problems.I suppose the question is: do we have problems where we'd like to use MCTS, but it's too slow because the tree depth to the goal is too long? Perhaps it's a bit like a hammer in search of a nail, but I guess it's nice to be clever enough at making hammers that we can beat any nail that comes our way.

I have a very strong suspicion that in practical cases, this should only be a fallback for when regular MCTS does not work. At every layer of the DC-MCTS search tree, it should, instead of subdividing the problem further, see if MCTS can solve it quickly. If so, no need to make it deeper.

I see a lot of algorithmic freedom in how to train a model to efficiently and effectively so the select heuristic. This will probably require a lot of future research for this technique to be effective. Probably every problem this is applied to will need it's own twist. I think it's important to note - you don't necessarily even need to use neural nets as a model for this. You could use random forests, or k nearest neighbors, or any other model which is effective. In fact, perhaps some of these simpler models might be more effective with fewer training samples, which makes them a better fit.

Ultimately, the big question over whether this technique is worth pursuing is whether it can be effective at problems with enormous high dimensional state spaces. Because most of the really hard problems fall in that category. If it can be made to work there, this could have great potential.

jrkirby

Question: given enough time, can MCTS (Monte Carlo Tree Search) find the best solution?

The problem about MCTS is that it chooses the child node with the highest probability of having a solution.
As long as those probabilities don't change, MCTS will choose the same node, no matter how many iterations you perfom. That means some leaves (terminal nodes) are unreachable.
If the best solution happens to be in an unreachable leaf, MCTS will never find it.

hcm

Maybe one day when we have large enough GPUs, I think it would be interesting to backpropagate through traverse procedure (9:35). That might give the SELECT function enough gradients to train for more complex tasks. IDK The whole procedure looks differentiable to me. If I didnt miss something.

herp_derpingson

Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)

Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning (Paper Explained)

Monte Carlo Tree Search on Traveling Salesman Problem

HPlan 2020: Applying Monte-Carlo Tree Search in HTN Planning

Advanced 4. Monte Carlo Tree Search

Monte Carlo Tree Search With Value Networks For Autonomous Spacecraft Operations

Monte Carlo Tree Search

A Versatile Multi-Robot Monte Carlo Tree Search Planner for On-Line Coverage Path Planning

Beating Connect 4 with Monte Carlo Tree Search! | Explanation + Code

Monte Carlo Tree Search Method for Selecting Most Promising Next Game States

ICAPS 2020: Painter et al. on 'Convex Hull Monte-Carlo Tree-Search

Enhancing Ant System with Monte Carlo Tree Search for solving Traveling Salesman Problem

Information Set Monte Carlo Search Trees in the game Coconuts

Monte Carlo Tree Search p3

Robotic Path Planning in Dynamic Environments using Generative RNNs and MCTS

MonteCarlo Tree Search with Goal-Based Heuristic

Mr X - Monte Carlo Tree Search, Detectives - Greedy (Loss)

Autonomous On-Demand Free Flight Operations in Urban Air Mobility using Monte Carlo Tree Search

StackIt 17 - Battle of the giants - MCTS vs AlphaBeta

Monte Carlo Tree Search Agent for the Game of HEX

S0054 - AI & Chatting - Monte Carlo Tree Search

Solving the game of Nim with Monte Carlo / Monte Carlo Tree Search (Part 2)

Basics of Monte Carlo Tree Search

HPlan 2021: Solving POMDPs online through HTN Planning and Monte Carlo Tree Search

Monte Carlo Tree Search with Robust Exploration