Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Показать описание

Instructor: John Schulman (OpenAI)
Lecture 5 Deep RL Bootcamp Berkeley August 2017
Natural Policy Gradients, TRPO, PPO

AI Prism

Рекомендации по теме

Комментарии

This is really dense, but also clears a lot up. I'll have to watch a second time.

littlebigphil

This is much less apprehensible than last lectures =<

arielel

AWW YEAH TRUST REGION THIS IS WHAT I NEEDED

THANKS!

dewinmoonl

I think that loss function at 7:30 should have the opposite sign because the gradient is derived for gradient ascent. So for gradient descent we should pretend that the gradient has opposite sign. And if we derive loss function for that - the "minus" sign will carry trough. Am I right?

mansurZ

Why calculate the ratio of new and old policy even though log prob is good enough anway? Is it because we want to use the ratio for clipping?

OliverZeigermann

explanations are all over the place - put some structure in the way you explain things

mfavaits

In 14.55 When maximizing the objective function with a penalty obtained with KL divergence, what if the expected values become minus?

shaz

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation

Deep RL Bootcamp Lecture 7 SVG, DDPG, and Stochastic Computation Graphs (John Schulman)

RL Course by David Silver - Lecture 5: Model Free Control

Deep RL Bootcamp Lecture 4A: Policy Gradients

L5 DDPG and SAC (Foundations of Deep RL Series)

Deep RL Bootcamp Lecture 1: Motivation + Overview + Exact Solution Methods

Deep RL Bootcamp Frontiers Lecture I: Recent Advances, Frontiers and Future of Deep RL

Deep RL Bootcamp Lecture 8 Derivative Free Methods

Deep RL Bootcamp Lecture 9 Model-based Reinforcement Learning

Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Deep RL Bootcamp Lecture 10A Utlities

CS294 Lecture5

Deep RL Bootcamp Lecture 3: Deep Q-Networks

L4 TRPO and PPO (Foundations of Deep RL Series)

Deep RL Bootcamp Lecture 2: Sampling-based Approximations and Function Fitting

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

Reinforcement Learning 5: Function Approximation and Deep Reinforcement Learning

Pieter Abbeel on Research Directions (Full Stack Deep Learning - November 2019)

Lecture 1: Introduction to Deep Learning - Full Stack Deep Learning - March 2019

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Reproducibility, Reusability, & Robustness in Deep Reinforcement Learning - Prof. Pineau

Reinforcement Learning with Large Datasets: a Path to Resourceful Autonomous Agents