MIT 6.S191: Reinforcement Learning

preview_player
Показать описание
MIT Introduction to Deep Learning 6.S191: Lecture 5
Deep Reinforcement Learning
Lecturer: Alexander Amini
2024 Edition

Lecture Outline:
0:00 - Introduction
2:20 - Classes of learning problems
6:33 - Definitions
12:30 - The Q function
17:29 - Deeper into the Q function
23:12 - Deep Q Networks
30:36 - Atari results and limitations
34:24 - Policy learning algorithms
39:31 - Discrete vs continuous actions
43:21 - Training policy gradients
49:10 - RL in real life
51:33 - VISTA simulator
53:24 - AlphaGo and AlphaZero and MuZero
58:58 - Summary

Subscribe to stay up to date with new deep learning lectures at MIT, or follow us @MITDeepLearning on Twitter and Instagram to stay fully-connected!!
Рекомендации по теме
Комментарии
Автор

One of the best intro to RL. Recommended to every student interested in this field to watch this amazing lecture. I have just completed it at 1:40 AM...Now waiting for Actor-Critic Type RL Agent to be released soon...Thanks and Good night.

izharulhaq
Автор

Babe wake up new 6.S191 lecture just dropped

visheshphutela
Автор

Amazing intro to the subject. Since it is interrelated to control theory it is mandatory to have a good back ground on control theory such as state space models and optimal control

artukikemty
Автор

I'm curious to listen to this lecture. I need more concepts to apply in my Thesis. I'm looking forward to seeing this happen soon.

gamalieliissacnyambacha
Автор

Lovely lecture.❤

Self driving car is a dynamic environment as compared to Gaming environment. It may be mentioned.

hrishabhg
Автор

Awesome job. Only curious if someone can explain how was the target part of the loss function computed at 26:40?

Asif-fpgy
Автор

can you advise for my startup, we applied for YC, we want to setup up indian team and RLHF as well as using SIMPO to agentify the hospital system and remove the inefficiences faced in the current hospital systems. im an aussie coming to america. we have hardware as well, been in guangzhou for the last 6 weeks finding the best containers and cameras triend to train for guaging container volume for measuring stock remaining.

Crashrapescrypto
Автор

10:30 equation for total reward should be summation of rewards from t=0 to t=t, right? But in equation its from t to infinity...why?

melvinkuriakose
Автор

Transformers can be used as a direct replacement for DRL since it can process sequences as well. There is an article in medium related to this alternative.

artukikemty
Автор

Hi, when i tried to run the modeling building part of lab 1, the line "tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]), " does not work, and the error says batch_input_shape is an unrecognized keyword argument to Embeddings, has anyone else encountered this problem? I looked up the tf.keras.Embeddings documentation and couldnt' find anything to replace it...What did you guys to solve it? Thanks!

wbwufuq
Автор

Can one conclude from the AlphaGo vs. AlphaZero showcase, that the bottleneck of "achieving" AGI/ASI, are we humans and the ethical/safety restrictions we have set?

christianrink
Автор

Please repeat questions, question askers audio is blown out or intelligible.
Some of the questions manage to be in the captions others but not all.
The professors mic is perfect however with a great mix one of the few series where you don't have to be max volume all the time.

TheNewton
Автор

Is there an application of reinforcement learning for subsurface reservoir simulation?

ikpesuemmanuel
Автор

Kenchin kokoro no.tabi.Study of the waste.

ltvwryx