An introduction to Policy Gradient methods - Deep Reinforcement Learning

Показать описание

In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.

After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.

If you want to support this channel, here is my patreon link:

Links mentioned in the video:

Рекомендации по теме

Комментарии

This is the best explanation of PPO on the net hands down

paulstevenconyngham

easily the best explanation of PPO I've ever seen. most papers and lectures get too tangled up in the probabilistic principals and advanced mathematic derivations and completely lose sight of what these models are doing in high level terms.

Alex-gcvo

I'm loving this RL series. Keep it up!

arkoraa

The value you provide in these videos is insane !
Thank you very much for guiding our learning process ;)

maloxi

This guy actually knows what he's talking about. Excellent video.

bigdreams

As someone who is working in RL field .... you did very good job.

sarahjamal

I actually understood your explanation cover to cover on first view and thought the 19 minutes felt more like 5.

Outstanding work.

DavidSaintloth

He is actually much better than Siraj Raval.

akshatagrawal

Amazing! This was the best explanation of PPO I have seen so far

alializadeh

Excellent video! Wonderful resource for anyone participating in AWS DeepRacer competitions.

BoltronRacingTeam

Explained so well and it was intuitive as well. I learnt more from this video than all the articles I found in the internet. Great job.

tyson

Keep it up. Brevity is the soul of wit, it is indeed a skill to summarize the crux of a concept in such lucid way..!

yuktikaura

Thank you for including links for learning more on the description.

..

By far the best explanation on YouTube.

BDEvans

12:19
Min operator also gets prefers old PO update
IF advantage is positive but probability of taking that action decreased, min operator selects unclipped objective here to undo the bad update

IF the advantage is negative but probability of taking that action increased, min operator also selects unclipped objective to undo bad update, just as mentioned in video.

Navhkrin

it is a long video, no doubt, but once you end watching it you think it was much better than actually reading the paper. thanks man!

MShahbazKharal

Coming back to this after thoroughly understanding Q-learning and looking into the advantage function in another network, this explanation is FAST, I wonder who would understand all that is happening without background knowledge

Samuel-wlfw

I watched all your videos today, great works! Love them!

zeyudeng

I watched this video more than 5 times and this was the best video about the PPO. Thank you for making great videos like this and keep up the good work. P.S: Your explanation was even simpler than the creator of this algorithm Schulman.)

scienceofart

Thank you so much for this video! This is way more insightful and intuitive than simply reading the papers!

Fireblazer

An introduction to Policy Gradient methods - Deep Reinforcement Learning

An introduction to Policy Gradient methods - Deep Reinforcement Learning

RL4.2 - Basic idea of policy gradient

Policy Gradient Methods | Reinforcement Learning Part 6

Policy Gradient Theorem Explained - Reinforcement Learning

Introduction to Reinforcement Learning|Policy Gradients in 7 mins!

How Policy Gradient Reinforcement Learning Works

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

Understanding Policy Gradient Proof - Introduction

Master Reinforcement Learning With These 3 Projects

RL Course by David Silver - Lecture 7: Policy Gradient Methods

CS885 Lecture 7a: Policy Gradient

RL4.1 Introduction: TD-methods versus Policy Gradients

L3 Policy Gradients and Advantage Estimation (Foundations of Deep RL Series)

Introduction to Policy Gradient

Intro to Policy Gradient Methods | Reinforcement Learning (INF8953DE) | Lecture - 8 | Part - 1

Reinforcement Learning 6: Policy Gradients and Actor Critics

Exercise 12: Policy Gradients

CS 182: Lecture 15: Part 1: Policy Gradients

Policy Gradient Approach

[Open DMQA Seminar] Introduction to Policy Gradient

REINFORCE: Reinforcement Learning Most Fundamental Algorithm

Reinforcement Learning 8: Policy gradient methods

From Policy Gradient to Actor-Critic: Introduction (RLVS 2021 version)

Policy Gradient Algorithms | Reinforcement Learning