RL Course by David Silver - Lecture 6: Value Function Approximation

preview_player
Показать описание
#Reinforcement Learning Course by David Silver# Lecture 6: Value Function Approximation

Рекомендации по теме
Комментарии
Автор

I'm grateful that David was successfully sampled in this iteration of the universe.

TheAIEpiphany
Автор

"Life is one big training set"
D. Silver, 2015

Chrnalis
Автор

0:00 Motivations
0:35 Outline

0:35 Large-Scale Reinforcement Learning
3:55 Value Function Approximation
8:40 Types of Value Function Approximation
11:55 Which Function Approximator?

15:38 Incremental Methods
15:43 Gradient Descent
17:35 Value Function Approx. by Gradient Descent
21:35 Feature Vectors
23:23 Linear Value Function Approximation
28:43 Table Lookup Features
30:50 Incremental Prediction Algorithms
31:10 Monte-Carlo with Value Function Approximation
37:33 TD Learning with Value Function Approximation
41:56 TD(lambda) with Value Function Approximation

49:00 Control with Value Function Approximation
52:30 Action-Value Function Approximation
53:50 Linear Action-Value Function Approximation
55:20 Incremental Control Algorithms
56:20 Linear Sarsa with Coarse Coding in Mountain Car

1:04:30 Study of lambda: Should we Bootstrap?
1:06:10 Baird's Counterexample
1:06:30 Parameter Divergence in Baird's Counterexample
1:06:50 Convergence of Prediction Algorithms
1:08:00 Gradient Temporal-Difference Learning
1:09:00 Convergence of Control Algorithms

1:10:19 Batch Methods
1:12:30 Batch Reinforcement Learning
1:13:30 Least Square Prediction
1:15:25 Stochastic Gradient Descent with Experience Replay
1:17:25 Experience Replay in Deep Q-Networks (DQN)
1:24:46 DQN in Atari
1:26:00 How much does DQN help?
1:27:35 Linear Least Square Prediction (2)
1:32:29 Convergence of Linear Least Squares Prediction Algorithms
1:32:50 Least Squares Policy Iteration
1:34:15 Chain Walk Example
1:35:00 LSPI in Chain Walk: Action-Value Function

yasseraziz
Автор

this guy is definitely the best to explain RL

proximo
Автор

It's gonna be pretty confusing the first time you hear about it, but give it a time, try to understand some of it and then come back to watch it again, you will see how much you can comprehend it.

MinhVu-fohd
Автор

This video is yet another example that the complexity of a subject really depends on the one who is teaching it. Thank you David, for making RL so much more accessible and understandable! It is a real pleasure listening to those lectures of yours ❤

florentinrieger
Автор

This is by far the best course on DRL I have ever watched. I kind of lost on the different notations in different papers although I can code many of them up. But this video summarized the root of the problems and solutions concisely which makes me be able to make sense what I have learned in the past. Thanks for the great work!

junzhu
Автор

2020 and still watching this to refresh the knowledge 🙋‍♀️

LunahLiu
Автор

David Silver is an intelligent and humble person. He explains things very well, I'm glad that I came across his lectures

shyomd
Автор

holy cow, in the last 30 minutes this lecture goes completely off the rails. The amount of concepts introduced and the number of slides shown increases exponentially towards the end :-)

martagutierrez
Автор

1:38 Introduction
15:28 Incremental Methods
1:12:17 Batch Methods

NganVu
Автор

It's amazing how intuitive these concepts become after watching these lectures. David Silver makes the math and theory behind RL that seems so hard to grasp impossible to forget.

aidankennedy
Автор

This is a loooot a of knowledge that probably took over 20 years to get this complicated and genius. The question is why would someone share it for free like this! It's a GIFT!

willlogs
Автор

golden information for free. Thanks DeepMind! We need more up to date course though :/

robosergTV
Автор

This reminds me of the quote by Lex Fridman that goes, "All machine learning is supervised learning. What differs is how that supervision is done."

existenence
Автор

I don't really care about the math in the lecture, I just enjoy the moment listening David talking about state of the art RL and suddenly catch his idea.

ThuongPham-wgbc
Автор

Very easy to understand ! outstanding lecture! outstanding teacher!

trantandat
Автор

it would be great, if this video had subtitels

TheKovosh
Автор

Amazing lecture. Thanks for the upload.

SadmanSakibEnan
Автор

16:39
- gradient is the direction of descent !

WahranRai