Overview of Deep Reinforcement Learning Methods

preview_player
Показать описание
This video gives an overview of methods for deep reinforcement learning, including deep Q-learning, actor-critic methods, deep policy networks, and policy gradient optimization algorithms.

This is a lecture in a series on reinforcement learning, following the new Chapter 11 from the 2nd edition of our book "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz

This video was produced at the University of Washington
Рекомендации по теме
Комментарии
Автор

10:20 I think in this example the state probability density function is assumed stationary for an ergodic environment even in the case of a dynamic policy. So perhaps this assumption implies a static reward function from the given environment, which would not be the case in a dynamic environment like a medical patient whose bodily response to a drug would vary throughout their lifetime/treatment. I checked, Sutton and Barto indeed mention ergodicity of the environment as the reason for policy-independent mu in their book on p.326 and p.333.

MaximYudayev
Автор

at 10:05, my understanding is that the fact that we do not derivate that probability comes from a local approximation assumption. So that formula is only approximately true for changes that are not too big. This simplification is one of the most important parts of the policy gradient theorem, and informs the design of "soft" policy-gradient algorithms, in which we do not allow the policy to change too much since our update logic only works for small steps.

matiascova
Автор

this is literally the best series for understanding RL ever thank you so much professor for sharing this.

BoltzmannVoid
Автор

Thanks professor Steve. Once I hear "welcome back" I just know it's our original professor Steve 😀👍

metluplast
Автор

i hope you'll create a series where all of the equations in this series is being applied to pytorch and creating simple projects, that would be awesome.

mawkuri
Автор

Excellent video, basically saved my day in trying to wrap my head around all the terms and algo :D
The concepts have been presented with unmatched clarity and conciseness.
Have been waiting for this since your last video on "Q-Learning".

Thank you so much!

gbbhkk
Автор

Thanks for the video ! can't wait for that deep MPC video.

Rodrigoviverosa
Автор

This is a fantastic tutorial. Thanks for putting in the time and effort to make it so digestible

dmochow
Автор

Thank you professor. This has been great to dust off some RL concepts I had forgotten

BlueOwnzU
Автор

6:22 But Professor, you know we love math derivations!

FRANKONATOR
Автор

@Eigensteve
Amazing video lectures. I had watched several of your series.
Please if possible make a series about Deep MPC, it would be of great value.

wkafa
Автор

Steve I follow all of your lectures. Being a mechanical engineer I really got amazed by watching your turbulence lectures. I personally worked with CFD using scientific python and visualization and computation using python and published a couple of research articles. I'm very eager to work under your guidance in the field of CFD and Fluid dynamics using Machine learning specifically simulation and modelling turbulence fluid flow field and explore the mysterious world of turbulence. How should I reach you for further communication?

ramanujanbose
Автор

@10.33 Steve, maybe mu sub theta is just a vector of constants for the means associated to the asymptotic distribution of each state s to scale the sum of weighted probabilities across all actions for that state in relation to each state's asymptotic distribution?

ryanmckenna
Автор

10:20, I think it's because we usually use PG in infinite state-action pair models. So in other words, mu(s) is untrackable. It's something like the latent space of an auto-encoder where we can't really track it to generate data.

sarvagyagupta
Автор

Excuse me professor, I am not sure about this specific case: If we have a DRL architectute that interacts with an ad-hoc model we have built (which presents a given structure as the Markov Decision Process), but the DRL agent does not have any prior information on the mechanics of such model (it can just measure outputs and generate inputs), this would be considered model-free?
Thank you for your amazing work!

joel.r.h
Автор

So my strategy for a better explanation would be to do it like Andrej: start of with a toy example on the real algo, also show the python toy code. Explain how it is connected to other models. After that u can start with the math derivation which is mostly interesting only for ML theorists.

randywelt
Автор

In DDQN, did you need the Q function (Theta_2) inside the gradient involving d/dTheta?

add-mtxc
Автор

Zabardast 🎉. Where can i find code toturials similar?

a_samad
Автор

Prof. Brunton, are you using a lightboard for the lectures? Do you have advice on which one to purchase?

add-mtxc
Автор

You are great!!!, a really helpful video. But sir, you did not talk about the MDP

MrAsare