Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)

Показать описание

Can you plan with a learned model of the world? Yes, but there's a catch: The better your planning algorithm is, the more the errors of your world model will hurt you! This paper solves this problem by regularizing the planning algorithm to stay in high probability regions, given its experience.

Abstract:
Trajectory optimization using a learned model of the environment is one of the core elements of model-based reinforcement learning. This procedure often suffers from exploiting inaccuracies of the learned model. We propose to regularize trajectory optimization by means of a denoising autoencoder that is trained on the same trajectories as the model of the environment. We show that the proposed regularization leads to improved planning with both gradient-based and gradient-free optimizers. We also demonstrate that using regularized trajectory optimization leads to rapid initial learning in a set of popular motor control tasks, which suggests that the proposed approach can be a useful tool for improving sample efficiency.

Authors: Rinu Boney, Norman Di Palo, Mathias Berglund, Alexander Ilin, Juho Kannala, Antti Rasmus, Harri Valpola

Links:

Рекомендации по теме

Комментарии

You absolutely smashed this review, nice one Yannic! I know Harri will love it. Can't wait to get our interview with him uploaded to street talk channel.

machinelearningdojo

Seems like this approach would pair quite well with the "Planning to Explore" paper! Great stuff, thanks for sharing

jakevikoren

hello sir is. There any possible to teach me write DAE on my project ?

qw

where i can find the link for the intervue with harri valpoda?

remix

I don't see where this would be used over other methods. Are we waiting for a good exploration factor to be discovered which can be added to the equation?
What if we go the opposite route and subtract the confidence factor to encourage exploration?

rishikaushik

perhaps phasing the optimization out over time would help - the assumption being that the model becomes more accurate and able to generalise and should be allowed to explore more.

jeremykothe

Could an approach like this work for automatic PID tunning for quadcopters? How much processing power does it require?

tiagotiagot

This might be a silly question but in the case of offline learning (using existing data for a process you want to model), can you still learn a world model (e.g. any DNN that can create a latent representation of it offline) and use it later to sample trajectories? Does this mean this approach has 2 different DNN, one for sampling and one for regularizing or is the same model this DAE? Sorry if this was obvious in the video.

theodorosgalanos

in (5) why do you need the gradient with respect to the action? do we not just take the action with the highest reward?

pragmascrypt

couldn't you also use the inverse of the probabilities to find states/trajectories that you haven't visited yet and use them to explore ? This when it works would also not suffer from retrospective novelty as described in the Planning to Explore paper.

timz

Can't you just have the world model output an uncertainty metric that's high in unexplored areas, and then the trajectory optimizer takes the uncertainty metric into consideration by behaving as if less certain areas are higher cost? Then the trajectory will tend to avoid areas that haven't been explored.

jrkirby

Hmm, I would be surprised if the DAE would have the usual AE architecture including bottleneck and still produce so good results that DAE(xtilde)-xtilde gives useful results at all. I guess they use something like a "residual" network where the input xtilde is added to the output so that it basically only has to predict the difference itself.

dermitdembrot

Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)

Regularizing Trajectory Optimization with Denoising Autoencoders (Paper Explained)

Trajectory Optimization with GTDynamics: Toy Problems

OpenCN - Trajectory optimization

Trajectory Optimization for Legged Robots with Slipping Motions

Automatic Parallel Parking via Trajectory Optimization

Robotics Lec11: Trajectory Optimization (Fall 2023)

POMDP Manipulation via Trajectory Optimization

SME driven experiment: AI Supported Robot Trajectory Optimization (Kautenburger & FZI)

Optimization-based Receding Horizon Trajectory Planner using Bernstein Polynomials

Mini-Lecture 9 (Trajectory Optimization) | MIT 6.832 (Underactuated Robotics), Spring 2021

[Tutorial] Optimization, Optimal Control, Trajectory Optimization, and Splines

Trajectory Optimization

Harri Valpola: System 2 AI and Planning in Model-Based Reinforcement Learning

Robust Trajectory Optimization over Uncertain Terrain. --- RSS Workshop on Robust Autonomy

Denoising - Noise and Categories of Denoisers - Part 1

An Experimental Validation of Contact-Implicit Trajectory Optimization for Manipulation

ICRA 2021 Presentation · Inverse Dynamics vs. Forward Dynamics in Direct Transcription Formulations

DDPS | Identification of Nonlinear Dynamical Systems from Noisy Measurements

Online Trajectory Generation: Reactive Control With Return Inside an Admissible Kinematic Domain

Sub sampled Cubic Regularization for Non convex Optimization

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for

Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for

AIHelsinki 29.4.2019: Rinu Boney

Total Deep Variation for Linear Inverse Problems