RL Course by David Silver - Lecture 3: Planning by Dynamic Programming

preview_player
Показать описание
#Reinforcement Learning Course by David Silver# Lecture 3: Planning by Dynamic Programming

Рекомендации по теме
Комментарии
Автор

2:25 Introduction
12:38 Policy Evaluation
29:40 Policy Iteration
1:01:45 Value Iteration
1:28:53 Extensions to Dynamic Programming
1:38:30 Contraction Mapping

NganVu
Автор

I have to say, he is amazing at answering questions. In over 90% of cases, he fully understands what is being asked and responds with a well formulated, straightforward, answer.

johnhefele
Автор

The questions from the students are in very high quality by being so informative and representative. And no double they help to make the lecture itself more clear.

sujiang
Автор

Looking at these lectures the second time clears up things very well. David Silver is an excellent professor, his articulations are precise and calculated.

akarshrastogi
Автор

His explanations are great! Really had a hard time trying to understand these RL topics from the stanford lectures. This makes it much clearer now. Love how he always comes back to how each algorithm fits into the larger context.

ReatchOfficial
Автор

after all these months of head scratching, I now finally understand policy and value iteration.


Thanks to Mr. Silver

zahash
Автор

Great Lecture about inner works of Dynamic Programming! Prof. Silver's explanations and intuitions are very helpful.Thanks for release this material for free!

marloncajamarca
Автор

dynamic programming - fundamental block to solve bellman euqation.
0:36 outline

2:25 Introduction

12:38 Policy Evaluation

29:40 Policy Iteration

1:01:45 Value Iteration

1:28:53 Extensions to Dynamic Programming

1:38:30 Contraction Mapping

zhongchuxiong
Автор

After the first 3 lectures of the series, I would say that it's the best lecture about Reinforcement learning I have took. It helped me a lot when I studied Optimal Robotics control in my master program. Thanks D.Silver for this course.

ThuongPham-wgbc
Автор

Thank you David Silver for best explanation of RL I have ever seen.

mind-set
Автор

awesome lectures !! you made it easy to achieve a really good understanding of the reinforcement learning (im talking after watching the first 3 videos), thanks for this great lectures !!

sergigalve
Автор

Thank you very much for making this public I've learned a lot!

LightStorm
Автор

Great courses. Very intuitive about abstract concepts. Just I tends to sit on the right side of my TV to make the image make sense.

Alexnotjones
Автор

The thing that David keeps on mentioning while working backwards is called as recursive leap of faith. We assume Nth state to be present and then make corresponding recursive call, which inturn breaks the problem into subproblem, like finding Fibonacci (n) and assuming our function would return us that. Then breaking into subproblem, just like we do in mathematical induction.

siddhantrai
Автор

His tutorials are Gold..Thanks Deepmind

karthik-exdm
Автор

In the toy model with car rentals some conservation laws are not kept right. It is minor detail, but still is important first to analyze the problem well. If in the first spot cars are taken and returned with probability 3, 3 per day this means that after a month our total cars will be the same, while the other dealer spot averages 4 by 2 this means after some finite time interval the dealer will run out of cars. The truth is that first spot averages more than 3 returns a day. Minor, but important, keep in mind that questioning the assumptions we make when defining the problem is as important as solving it.

chavdardanchev
Автор

This is quite possibly the best lecture I have sat through…. Either in-person or online !

billykotsos
Автор

The person asking the question at 28:07 is mistaken, and the lecturer's answer is incorrect. The incorrect premise of the question is that the only "terminal state" is the lower right corner, and so the square (3, 3) (labelled from the bottom left) should be "further" from the terminal point than (4, 3). This is wrong: the top left corner is also part of the terminal state, and is closer to (3, 3). Dr. Silver's answer does not address this. Instead, he (apparently incorrectly) says the discrepancy is due to rounding error. However, the numbers shown are consistent with an exact solution of the problem by solving Poisson's equation for the random walk.

shapeoperator
Автор

28:14 All 6 fields converges exactly to 20.(0) by using equation. It is because this values represent expected reward starting from one of this field and folowing policy 1/4 for each direction and also -1 reward for each transition. More interestingly is to programm this grid with a pawn running on each direction randomly, with policy 1/4, each time spawning in such spot after achieving termination spot. Over a minute after runs you will see that average reward converges to -20.0...., it looks almost like a magic and works perfeclty of course for any other field. Author was just confused a little by a question. Respect for such good lectures.

СергейВоробьёв-ылб
Автор

@1:16:00 In value iteration we produce a sequence v1->v2->v3->... He says that, say, v3 might not correspond to any policy. This makes sense, but we're really interested in finding a greedy policy using v3, which we can do as before. I find Silver's exposition slightly confusing at times.

edit: I think a student pointed out the same thing immediately after.

kiuhnmmnhuik