RL Course by David Silver - Lecture 7: Policy Gradient Methods

Показать описание

#Reinforcement Learning Course by David Silver# Lecture 7: Policy Gradient Methods (updated video thanks to: John Assael)

Google DeepMind

Рекомендации по теме

Комментарии

People who feel like quiting at this stage, relax, take a break, watch this video over and over again and read sutton and barto. Do everything but dont quit. You are amongst the 10% who came this far.

akshatgarg

Oh, I can't concentrate without seeing David.

alexanderyau

This is what I call commitment: David Silver explored not showing his face policy, received less reward, and then switched back to the past lectures' optimal policy.
Nothing like learning from this "one stream of data called life."

xicocaio

For those confused:
- whenever he speaks of the u vector he's talking about the theta vector (slides don't match).
- At 1:02:00 he's talking about slide 4.
- At 1:16:35 he says Vhat but slides show Vv
- He refers to Q in Natural Policy Gradient, which is actually Gtheta in the slides
- At 1:30:30 the slide should be 41 (the last slide), not the Natural Actor-Critic slide

JonnyHuman

ahhh... where did u go david.. i loved your moderated gesturing

saltcheese

This course should be called: "But wait, there's an even better algorithm!"

michaellin

And it turns out that this is best course to learn RL even after 6 years.

krishnanjanareddy

3:24 Introduction
26:39 Finite Difference Policy Gradient
33:38 Monte-Carlo Policy Gradient
52:55 Actor-Critic Policy Gradient

NganVu

1:30 Outline

3:25 Policy-Based Reinforcement Learning
7:40 Value-Based and Policy-Based RL
10:15 Advantages of Policy Based RL
14:10 Example: Rock-Paper-Scissors
16:00 Example: Aliased Gridworld

20:45 Policy Objective Function
23:55 Policy Optimization
26:40 Policy Gradient
28:30 Computing Gradients by Finite Differences
30:30 Training AIBO to Walk by Finite Difference Policy Gradient
33:40 Score Function
36:45 Softmax Policy
39:28 Gaussian Policy
41:30 One-Step MDPs

46:35 Policy Gradient Theorem
48:30 Monte-Carlo Policy Gradient (REINFORCE)
51:05 Puck World Example

53:00 Reducing Variance Using a Critic
56:00 Estimating the Action-Value Function
57:10 Action-Value Actor-Critic
1:05:04 Bias in Actor-Critic Algorithms
1:05:30 Compatible Function Approximation
1:06:00 Proof of Compatible Function Approximation Theorem
1:06:33 Reducing Variance using a Baseline
1:12:05 Estimating the Advantage Function
1:17:00 Critics at Different Time-Scales
1:18:30 Actors at Different Time-Scales
1:21:38 Policy Gradient with Eligibility Traces

1:23:50 Alternative Policy Gradient Directions
1:26:08 Natural Policy Gradient
1:30:05 Natural Actor-Critic

yasseraziz

I have to listen repetitively because I could not concentrate with out seeing him. I have to imagine what he was trying to show through his gestures . This is a gold standard lecture for RL. Thank you professor David Silver.

finarwa

Damn.Its was alot easier understandin it with gestures

WuuD

It would have been great if it was possible to recreate David in this lecture based on his voice using some combination of RL frameworks.

georgegvishiani

Starts at 1:25.
Actor critic at 52:55.

chrisanderson

Unfortunately the slides do not fit what is said. It's a pity they don't seem to put much effort into these videos. David is surely one of the best people to learn RL from.

MrCmon

I am not sure exactly how this video was created, but the right slide is often not displayed (especially near the end, but elsewhere as well). It is probably better to download the slides for the lecture and find your own way through them while listening to the audio.

liamroche

It is unfortunate that exactly this episode is without david in the screen. It is again a quite compley topic and Devaid jumping and running around and pointing out the relevant parts make it much easier to digest.

florentinrieger

Just to make sure, in 36:22, the purpose of the likelihood ratio trick is to make the gradient of the objective function gets converted to a expectation again? Just a David said at 44:33, "... that's the whole point of using the likelihood ratio trick".

helinw

This lecture was immensely difficult to get owing to david's absence and mismatch of slides

akarshrastogi

“No matter how ridiculous the odds may seem, within us resides the power to overcome these challenges and achieve something beautiful. That one day we look back at where we started, and be amazed by how far we’ve come.” -Technoblade

I started this series a month ago in summer break, I even did the Easy21 assignment and now I finally learned what I wanted, when I started this series i.e. Actor Critic Method. Time to do some gymnasium env.

OmPrakash-vtvr

It took me a while to realize that policy function pi(s, a) is alternately used as the probability of taking a certain action in state s, and the action proper (a notation overload that comes from the Sutton book). I think specific notation for each instance would avoid a lot of confusion.

jorgelarangeira

RL Course by David Silver - Lecture 7: Policy Gradient Methods

RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning

RL Course by David Silver - Lecture 4: Model-Free Prediction

RL Course by David Silver - Lecture 3: Planning by Dynamic Programming

RL Course by David Silver - Lecture 2: Markov Decision Process

RL Course by David Silver - Lecture 5: Model Free Control

RL Course by David Silver - Lecture 6: Value Function Approximation

RL Course by David Silver - Lecture 8: Integrating Learning and Planning

RL Course by David Silver - Lecture 9: Exploration and Exploitation

RL Course by David Silver - Lecture 7: Policy Gradient Methods

RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver - Lecture 10: Classic Games

RL Course by David Silver - Lecture 10: Classic Games [w/visible slides]

RL Course by David Silver Lecture 7 Policy Gradient Methods

RL Course by David Silver Lecture 9 Exploration and Exploitation

RL Course by David Silver Lecture 2 Markov Decision Process

RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver Lecture 2 Markov Decision Process

RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver Lecture 5 Model Free Control

【Outline of RL】RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver Lecture 5 Model Free Control

RL Course by David Silver Lecture 4 Model Free Prediction

RL Course by David Silver Lecture 5 Model Free Controlpart3