Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Показать описание

Here we introduce dynamic programming, which is a cornerstone of model-based reinforcement learning. We demonstrate dynamic programming for policy iteration and value iteration, leading to the quality function and Q-learning.

This is a lecture in a series on reinforcement learning, following the new Chapter 11 from the 2nd edition of our book "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz

This video was produced at the University of Washington

Steve Brunton

Рекомендации по теме

Комментарии

This is just awesome, especially for an undergraduate without much pre-knowledge about machine learning.Many thanks from a Chinese freshman.

yiyangshao

I've watched other lectures on RL before, I can understand the formulas much better now, the way you explain formulas is brilliant, you're a wonderful math lecturer

ghazal

i love the way you explain it through the formula's most experts tell you the formula then go to an actual case, which leaves the learner disconnected from the math, thanks!

fredflintstone

At 3:57, I think the R(s', s, a) function you are referring to is the "reward function", which returns the "Immediately reward (r) if you are at stage (s) and do the action (a) which lead to stage (s')". That would make more sense than "returning a PROBABILITY of a reward (r) given (s, a and s')". I saw this in your book also but cannot find this kind of function anywhere else. All other resources I found, when talking about this function R, that means the "immediately reward" of doing action a given stage s and new stage s', NOT the "probability of the reward".
Later on in the clip, when you uses it in value function, I also see you use it as a mean for measuring the "Value of reward", not the "Probability of reward", therefore I think this might really be a mistake or something.
If I'm getting it wrong somewhere, please help me clear my thought. I'm just being curious.
Love your great work.

huyvuquang

16:55 The value iteration function (VI) differs slightly from Bellman's equation (BE) because VI uses max on a (hence uses a single value), whereas BE uses max on all pi. Because pi is a probabilistic function, i.e. is yielding a specific action value 'a' with a certain probability, VI would need to have another level of summation over a multiplying the terms by pi(s, a).
20:05 Here we construct pi(s, a) as the argmax of VI. This means we set pi(s, argmax(s))=1, and pi(s, a')=0 for all other values a' /= argmax(s). This means pi(s, a) is deterministic, instead of probabilistic.

micknamens

Optimal control, Control Theory, Reinforcement Learning, Machine Learning, System Theory, System Identification are intellectual banquet.

RasitEvduzen

This series really helps to understand the RL mechanisms. Thanks a ton for effort!
Showing heart gesture unintentionally 21:35 😀

chandhru

I actually feel smarter after watching this. Excellent video on all fronts!

aaroncollinsworth

Thank you so much. I've watched a lot of videos and didn't fully get these concepts for some reason. Now I think I finally get it. You're a great teacher.

august

Beautiful. Please continue. Will you explain algorithms like PPO, TD3, DDPG, etc.? If so, I will appreciate each one. Also, it will be very interesting if you can give your opinion on some RL libraries like ray/RLlib, baselines3, etc. I know that this may be much more than what you are thinking of including in this course, but I do not lose anything by suggesting those topics to you :) Thank you.

mariogalindoq

Thank you Prof! this video really helpful to classify RL's methods. I really appreciate your diagram and your explanation.

adinovitarini

your videos are increadibly well thought out and very educational, I should have known about them sooner. greetings from Munich, Germany!

samueldelsol

Thank you for simplifying a lot of things. I had read corresponding chapters from Sutton and Barto book but I got more clarity on practical aspects from this video.

NaveenKumar-yuvw

Now I understand every inch of the research paper I was reading. Thanks!!!!

suri

Very well structured and layed out, clearly explained, thank you

asier

Thank you for making these videos. I'm learning so much! You are such a great explainer.

kevinchahine

This is an excellent companion to your book. Thanks for both!

matthewchunk

The bellman’s equation reminds me of sequential games from game theory, where you traverse a game tree and optimal choice at each branch leads to globally optimal states.

robinwang

Love this series! Hoped the video to go on and on but it ended too quickly. Can't wait for the next part! Keep up the great work :)

Moonz

Tnx a lot professor Brunton!
You're creating great materials!

AliRashidi

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

Why Choose Model-Based Reinforcement Learning?

Reinforcement Learning Series: Overview of Methods

How Are Value-Based, Policy-Based, and Model-Based Methods Different in Reinforcement Learning?

Types of Reinforcement Learning: A Comprehensive Guide

CS885 Lecture 9: Model-based RL

Policies and Value Functions - Good Actions for a Reinforcement Learning Agent

L6 Model-based RL (Foundations of Deep RL Series)

Online Refresher Course on AI & Machine Learning Mastery | CFDET IIT (BHU) Varanasi | 22 Nov 202...

Towards Model-Based Reinforcement Learning on Real Robots | AI & Engineering | Georg Martius

SINDy-RL: Interpretable and Efficient Model-Based Reinforcement Learning

Model Based RL Finally Works!

DeepRL1.6 Model based versus Model free Reinforcement Learning Source

Model-based reinforcement learning

Efficient model-based reinforcement learning through optimistic policy search and planning

MOPO: Model-Based Offline Policy Optimization

An introduction to Policy Gradient methods - Deep Reinforcement Learning

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Model-Based Reinforcement Learning with Reinforcement Learning Toolbox

Learning to Model What Matters // Model-Based Reinforcement Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Model-Based Reinforcement Learning with a Generative Model is Minimax Optimal

No. 48 @ Model-Based Reinforcement Learning (MB-RL) @ Deep Learning 101

RLSS 2023 - Model-based Reinforcement Learning - Andreas Krause (presented by Felix Berkenkamp)