Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

Показать описание

Part two of a six part series on Reinforcement Learning. We discuss the Bellman Equations, Dynamic Programming and Generalized Policy Iteration.

SOCIAL MEDIA

SOURCES

[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.

SOURCE NOTES

The video covers the topics of Chapter 3 and 4 from [1]. The whole series teaches from [1]. [2] was a useful secondary resource.

TIMESTAMP
0:00 What We'll Learn
1:09 Review of Previous Topics
2:46 Definition of Dynamic Programming
3:05 Discovering the Bellman Equation
7:13 Bellman Optimality
8:41 A Grid View of the Bellman Equations
11:24 Policy Evaluation
13:58 Policy Improvement
15:55 Generalized Policy Iteration
17:55 A Beautiful View of GPI
18:14 The Gambler's Problem
20:42 Watch the Next Video!

Рекомендации по теме

Комментарии

Great video! Can you explain more, that "sneaky" equation in aroun 6:00? Why is G_t+1 = v(S_t+1) in the expectation?

mbeloch

I can't express how good these videos are, thank you so much for all the time you put into making them! this is a truly special channel

TheRealExecuter

So far the best and optimized playlist for reinforcement learning.

abhinavanand

Let's read from the textbook. *He opens the book, then stares at the camera and confidently recites from memory*.

mCoding

After going through some books, paid courses, I finally understand the fundamentals of RL through your video. Well done and subscribed.

kplim

Kudos, good sir. Your pedagogical skill is both impressive and efficient.
Please continue to grace the world with it for the good of all of mankind.

timothytyree

You saved lot of my time by simple, concise and easy to follow video compared to other I have seen so far.

rajatjaiswal

This is the best reinforcements learning resource available in internet, Period

manudasmd

Imagine if such great educational videos existed for all foundational topics in artificial intelligence, engineering, math, and physics. We are slowly getting there :). 3b1b py module manim has made it quite accessible to create high-quality, time efficient (for learning) educational content. It's amazing what people create. Thank you for the great videos!

valterszakrevskis

THESE ARE THE BEST VIDEOS ON THIS TOPIC EVER, AND YOUR WAY OF EXPLAINING AND MAKING THINGS SOUND SO SIMPLE IS INCREDIBLE, THANK YOU A MILLION TIMES

fzet

I really enjoyed the video!
It's really helpful to me that you teach with fluctuating intonation because otherwise I can't really focus. Good job!

lyzhenyang

Damn, it really only took you 20 minutes to explain something that my professor needed two full lectures for. Thank you so much! This was so helpful

nicolaiholtkamp

best video lectures of rl on the internet

hassaniftikhar

One of the best series if not he best in describing DRL.
Good Job !!!!

Yahia.N_Ahmed

Your videos are like espresso, condensed, tasty, full bodied but you should not try to rush when watching them. There are no spare words so when you miss one, you're lost 😀Great video, I love that logical structure, rock solid!

marcin.sobocinski

In 15:46 you said "if that policy is greedy in respect to thatvalue function" but i don't quite understand what you ment by that. Other than that the video is crystal clear. thank you for these videos.

hypershadow

This is so well done! Explaining stuff well can be very difficult. Thanks a lot! I'm studying RL at a university course, but this was way more helpful!

vesk

Excellent video. Even though I have been studying RL for a while, the video clarified some previously learned concepts and gave me a better understanding of the topic.

usonian

These series of videos are really nice. I would love to see you go more into the theory/proofs of why policy iteration works... as another series. Once again, really good work.

katchdawgs

It turns out that in fact, algebra *is* fun, cool, and exciting

AcademicDisciplineHD

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

Bellman Equations, Dynamic Programming, Generalized Policy Iteration | Reinforcement Learning Part 2

Bellman Equation - Explained!

Model Based Reinforcement Learning: Policy Iteration, Value Iteration, and Dynamic Programming

How to use Bellman Equation Reinforcement Learning | Bellman Equation Machine Learning Mahesh Huddar

Nonlinear Control: Hamilton Jacobi Bellman (HJB) and Dynamic Programming

Solving a Simple Finite Horizon Dynamic Programming Problem

The Bellman Equation | Trailer | Eric Bellman | Kirstie Bellman | Gabriel Bellman

Bellman Equation Definition

The Bellman Equations - 3

4 BELLMAN'S EQUATIONS III

The Bellman Equations Explained - RL Theory

Bellman equation | Bellman Backup | Optimal Value | Value Iteration | MDP

Bellman equation - made easy and clear

008 The Bellman Equation

How to Write a Bellman Equation

3 BELLMAN'S EQUATIONS II

The Bellman Equation | Macro Struggle

RL #21 Complete Derivation of Bellman Equation from scratch | The RL Series

Bellman Equations

Clear Explanation of Value Function and Bellman Equation (PART I) Reinforcement Learning Tutorial

15. Dynamic Programming, Part 1: SRTBOT, Fib, DAGs, Bowling

Introduction to reinforcement learning|Deriving the Bellman Equation in 3 steps in under 15 min!

AI03: Bellman Expectation Equation

MDP, Bellman Equations, Q-Learning - Implemented (10)