Deep RL Bootcamp Lecture 5: Natural Policy Gradients, TRPO, PPO

preview_player
Показать описание
Instructor: John Schulman (OpenAI)
Lecture 5 Deep RL Bootcamp Berkeley August 2017
Natural Policy Gradients, TRPO, PPO
Рекомендации по теме
Комментарии
Автор

This is really dense, but also clears a lot up. I'll have to watch a second time.

littlebigphil
Автор

This is much less apprehensible than last lectures =<

arielel
Автор

AWW YEAH TRUST REGION THIS IS WHAT I NEEDED

THANKS!

dewinmoonl
Автор

I think that loss function at 7:30 should have the opposite sign because the gradient is derived for gradient ascent. So for gradient descent we should pretend that the gradient has opposite sign. And if we derive loss function for that - the "minus" sign will carry trough. Am I right?

mansurZ
Автор

Why calculate the ratio of new and old policy even though log prob is good enough anway? Is it because we want to use the ratio for clipping?

OliverZeigermann
Автор

explanations are all over the place - put some structure in the way you explain things

mfavaits
Автор

In 14.55 When maximizing the objective function with a penalty obtained with KL divergence, what if the expected values become minus?

shaz