An introduction to Policy Gradient methods - Deep Reinforcement Learning

preview_player
Показать описание
In this episode I introduce Policy Gradient methods for Deep Reinforcement Learning.

After a general overview, I dive into Proximal Policy Optimization: an algorithm designed at OpenAI that tries to find a balance between sample efficiency and code complexity. PPO is the algorithm used to train the OpenAI Five system and is also used in a wide range of other challenges like Atari and robotic control tasks.

If you want to support this channel, here is my patreon link:

Links mentioned in the video:
Рекомендации по теме
Комментарии
Автор

This is the best explanation of PPO on the net hands down

paulstevenconyngham
Автор

easily the best explanation of PPO I've ever seen. most papers and lectures get too tangled up in the probabilistic principals and advanced mathematic derivations and completely lose sight of what these models are doing in high level terms.

Alex-gcvo
Автор

I'm loving this RL series. Keep it up!

arkoraa
Автор

The value you provide in these videos is insane !
Thank you very much for guiding our learning process ;)

maloxi
Автор

This guy actually knows what he's talking about. Excellent video.

bigdreams
Автор

As someone who is working in RL field .... you did very good job.

sarahjamal
Автор

I actually understood your explanation cover to cover on first view and thought the 19 minutes felt more like 5.

Outstanding work.

DavidSaintloth
Автор

He is actually much better than Siraj Raval.

akshatagrawal
Автор

Amazing! This was the best explanation of PPO I have seen so far

alializadeh
Автор

Excellent video! Wonderful resource for anyone participating in AWS DeepRacer competitions.

BoltronRacingTeam
Автор

Explained so well and it was intuitive as well. I learnt more from this video than all the articles I found in the internet. Great job.

tyson
Автор

Keep it up. Brevity is the soul of wit, it is indeed a skill to summarize the crux of a concept in such lucid way..!

yuktikaura
Автор

Thank you for including links for learning more on the description.

..
Автор

By far the best explanation on YouTube.

BDEvans
Автор

12:19
Min operator also gets prefers old PO update
IF advantage is positive but probability of taking that action decreased, min operator selects unclipped objective here to undo the bad update


IF the advantage is negative but probability of taking that action increased, min operator also selects unclipped objective to undo bad update, just as mentioned in video.

Navhkrin
Автор

it is a long video, no doubt, but once you end watching it you think it was much better than actually reading the paper. thanks man!

MShahbazKharal
Автор

Coming back to this after thoroughly understanding Q-learning and looking into the advantage function in another network, this explanation is FAST, I wonder who would understand all that is happening without background knowledge

Samuel-wlfw
Автор

I watched all your videos today, great works! Love them!

zeyudeng
Автор

I watched this video more than 5 times and this was the best video about the PPO. Thank you for making great videos like this and keep up the good work. P.S: Your explanation was even simpler than the creator of this algorithm Schulman.)

scienceofart
Автор

Thank you so much for this video! This is way more insightful and intuitive than simply reading the papers!

Fireblazer