RL Course by David Silver - Lecture 3: Planning by Dynamic Programming

Показать описание

#Reinforcement Learning Course by David Silver# Lecture 3: Planning by Dynamic Programming

Рекомендации по теме

Комментарии

2:25 Introduction
12:38 Policy Evaluation
29:40 Policy Iteration
1:01:45 Value Iteration
1:28:53 Extensions to Dynamic Programming
1:38:30 Contraction Mapping

NganVu

I have to say, he is amazing at answering questions. In over 90% of cases, he fully understands what is being asked and responds with a well formulated, straightforward, answer.

johnhefele

The questions from the students are in very high quality by being so informative and representative. And no double they help to make the lecture itself more clear.

sujiang

Looking at these lectures the second time clears up things very well. David Silver is an excellent professor, his articulations are precise and calculated.

akarshrastogi

His explanations are great! Really had a hard time trying to understand these RL topics from the stanford lectures. This makes it much clearer now. Love how he always comes back to how each algorithm fits into the larger context.

ReatchOfficial

after all these months of head scratching, I now finally understand policy and value iteration.

Thanks to Mr. Silver

zahash

Great Lecture about inner works of Dynamic Programming! Prof. Silver's explanations and intuitions are very helpful.Thanks for release this material for free!

marloncajamarca

dynamic programming - fundamental block to solve bellman euqation.
0:36 outline

2:25 Introduction

12:38 Policy Evaluation

29:40 Policy Iteration

1:01:45 Value Iteration

1:28:53 Extensions to Dynamic Programming

1:38:30 Contraction Mapping

zhongchuxiong

After the first 3 lectures of the series, I would say that it's the best lecture about Reinforcement learning I have took. It helped me a lot when I studied Optimal Robotics control in my master program. Thanks D.Silver for this course.

ThuongPham-wgbc

Thank you David Silver for best explanation of RL I have ever seen.

mind-set

awesome lectures !! you made it easy to achieve a really good understanding of the reinforcement learning (im talking after watching the first 3 videos), thanks for this great lectures !!

sergigalve

Thank you very much for making this public I've learned a lot!

LightStorm

Great courses. Very intuitive about abstract concepts. Just I tends to sit on the right side of my TV to make the image make sense.

Alexnotjones

The thing that David keeps on mentioning while working backwards is called as recursive leap of faith. We assume Nth state to be present and then make corresponding recursive call, which inturn breaks the problem into subproblem, like finding Fibonacci (n) and assuming our function would return us that. Then breaking into subproblem, just like we do in mathematical induction.

siddhantrai

His tutorials are Gold..Thanks Deepmind

karthik-exdm

In the toy model with car rentals some conservation laws are not kept right. It is minor detail, but still is important first to analyze the problem well. If in the first spot cars are taken and returned with probability 3, 3 per day this means that after a month our total cars will be the same, while the other dealer spot averages 4 by 2 this means after some finite time interval the dealer will run out of cars. The truth is that first spot averages more than 3 returns a day. Minor, but important, keep in mind that questioning the assumptions we make when defining the problem is as important as solving it.

chavdardanchev

This is quite possibly the best lecture I have sat through…. Either in-person or online !

billykotsos

The person asking the question at 28:07 is mistaken, and the lecturer's answer is incorrect. The incorrect premise of the question is that the only "terminal state" is the lower right corner, and so the square (3, 3) (labelled from the bottom left) should be "further" from the terminal point than (4, 3). This is wrong: the top left corner is also part of the terminal state, and is closer to (3, 3). Dr. Silver's answer does not address this. Instead, he (apparently incorrectly) says the discrepancy is due to rounding error. However, the numbers shown are consistent with an exact solution of the problem by solving Poisson's equation for the random walk.

shapeoperator

28:14 All 6 fields converges exactly to 20.(0) by using equation. It is because this values represent expected reward starting from one of this field and folowing policy 1/4 for each direction and also -1 reward for each transition. More interestingly is to programm this grid with a pawn running on each direction randomly, with policy 1/4, each time spawning in such spot after achieving termination spot. Over a minute after runs you will see that average reward converges to -20.0...., it looks almost like a magic and works perfeclty of course for any other field. Author was just confused a little by a question. Respect for such good lectures.

СергейВоробьёв-ылб

@1:16:00 In value iteration we produce a sequence v1->v2->v3->... He says that, say, v3 might not correspond to any policy. This makes sense, but we're really interested in finding a greedy policy using v3, which we can do as before. I find Silver's exposition slightly confusing at times.

edit: I think a student pointed out the same thing immediately after.

kiuhnmmnhuik

RL Course by David Silver - Lecture 3: Planning by Dynamic Programming

RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning

RL Course by David Silver - Lecture 4: Model-Free Prediction

RL Course by David Silver - Lecture 3: Planning by Dynamic Programming

RL Course by David Silver - Lecture 2: Markov Decision Process

RL Course by David Silver - Lecture 5: Model Free Control

RL Course by David Silver - Lecture 6: Value Function Approximation

RL Course by David Silver - Lecture 8: Integrating Learning and Planning

RL Course by David Silver - Lecture 9: Exploration and Exploitation

RL Course by David Silver - Lecture 7: Policy Gradient Methods

RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver - Lecture 10: Classic Games

RL Course by David Silver - Lecture 10: Classic Games [w/visible slides]

RL Course by David Silver Lecture 9 Exploration and Exploitation

RL Course by David Silver Lecture 7 Policy Gradient Methods

RL Course by David Silver Lecture 2 Markov Decision Process

RL Course by David Silver Lecture 2 Markov Decision Process

RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver Lecture 5 Model Free Control

【Outline of RL】RL Course by David Silver Lecture 1 Introduction to Reinforcement Learning

RL Course by David Silver Lecture 5 Model Free Control

RL Course by David Silver Lecture 8 Integrating Learning and Planning

RL Course by David Silver Lecture 4 Model Free Prediction

RL Course by David Silver Lecture 5 Model Free Controlpart3