MIT 6.S094: Deep Reinforcement Learning for Motion Planning

preview_player
Показать описание
This is lecture 2 of course 6.S094: Deep Learning for Self-Driving Cars taught in Winter 2017. This lecture introduces types of machine learning, the neuron as a computational building block for neural nets, q-learning, deep reinforcement learning, and the DeepTraffic simulation that utilizes deep reinforcement learning for the motion planning task.

INFO:

Links to individual lecture videos for the course:

Lecture 1: Introduction to Deep Learning and Self-Driving Cars

Lecture 2: Deep Reinforcement Learning for Motion Planning

Lecture 3: Convolutional Neural Networks for End-to-End Learning of the Driving Task

Lecture 4: Recurrent Neural Networks for Steering through Time

Lecture 5: Deep Learning for Human-Centered Semi-Autonomous Vehicles

CONNECT:
- If you enjoyed this video, please subscribe to this channel.
Рекомендации по теме
Комментарии
Автор

The concept of being able to virtually attend the class as it happens in MIT is awesome. Thank you Lex for taking the time to share the videos!

chuyhable
Автор

You are the best machine learning teacher I have ever seen. Thanks for generosity in teaching your knowledge :)

jacobrafati
Автор

Learning about one of my favorite topics from Lex is just awesome. Thanks to this humble legend for sharing this!

cynicalmedia
Автор

It's amazing how technology allows us to access such high-quality educational content from anywhere in the world. Huge thanks to Lex for sharing these insightful and inspiring videos with us!

turhancan
Автор

Thanks a lot it really helped me understand reinforcement learning, one year ago i was new to machine learning and i have known only basic structure of neural networks and feed forward process, i did not know anything about back prop or derivative based optimization so i have made a neural network that plays a simple game similar to Atari games and trained it with a genetic algorithm that aims to maximize score of player during a simulation round. Now it is really nice to find academic materials about that

marsilzakour
Автор

I enjoy how this lecture series slips in so many life lessons while masquerading as a series on the technical aspects of deep learning.

HarishNarayanan
Автор

First you are a good human and a fantastic teacher.
Because you share the knowledge with the people who have not has the possibility to study by a university.
Thanks for that and god bless you.

jayhu
Автор

At 36:56 it seems like you can reduce the reward to Q(t+1) - Q(t), or just the simple increase in the "value" of the state in time period t+1 over period t. Then the discount rate (y) can be applied to that gain to discount it back to time t. The learning rate (a) then becomes a "growth of future state" valuations. Then the most important thing is that y * a > 1, or your learning never overcomes the burden of the discount rate.
This is really similar to the dividend growth model of stock valuation:

D/(k-g)
D=dividend at time 0, k=discount rate, g=growth rate.

The strange similarity is that when the "Learning rate" (feels like this should be "Applied Learning Rate") is greater than the discount rate, there is "growth" in future states, otherwise there is contraction (think The Dark Ages). In the dividend discount model, whenever the growth rate is extrapolated into infinity as higher than the discount rate, the denominator goes to zero and below, and the valuation goes to infinity.

Yeah, I like this guys analogies translating the bedrock of machine learning, etc to fundamental life lessons.
Never stop learning... and then doing!

pauldacus
Автор

It is interesting that many NAND gates can be used to implement sum function. Then many sum functional can be used to learn
NAND.

daryoushmehrtash
Автор

Thank you lex for uploading this awesome video here for free.

Tibetan-experience
Автор

Thanks very much Lex. I greatly appreciate your efforts of sharing your knowledge with the world. You Rock :)

Sasipano
Автор

wow now we have subtitles!
Thank you very much!

TpBrass
Автор

I like Lex a lot, but the objective function for Q I think is wrong (32:49). Optimal Q-values are intended to maximize the *cumulative* future reward, not just reward at the next time step. One could easily imagine that the best action to take in one's current state delivers a loss at the next step, but in the long term achieves the greatest net gain in reward.

sharp
Автор

Presentation style of the trainer is awesome.

khayyam
Автор

Am I the only one that finds the explanations to be quite cumbersome and not easily digestible!? I'm having a hard time following some things, gotta pause, go back, rewatch segments, speculate on a lot of things and extrapolate on speculations then rewatch hoping to match speculations on stated facts to confirm my understanding is correct. I'm not an expert in teaching nor am I a genius but when the lesson leaves so many loose ends and raises more question than it answers, it might not be properly optimized for teaching. I do appreciate the effort though and acknowledge the fact that it's a difficult subject, I'm a visual learner and it's a pain in the ass to find material that suits me on this subject.

Nightlurk
Автор

FYI - The reinforcemnet learning part starts at 19:46

fandu
Автор

Hi, I wonder why the Q-Learning equations are different in the same slide?
at 36:11 the equation above is Q[s', a']=..., but the pseudo code down there is Q[s, a]=...

yanadsl
Автор

Thanks for the great lecture!
I have one doubt. At 56:43, in the demo video (Atari breakout) of the model trained after 240 minutes, how did the ball go to the top portion of the game without any path from the bottom (i.e., without making a path from the bottom by breaking the tiles)?

arulkumar
Автор

Thanks, and it is a good way to deliver the material in a motivation speech style !!!

enochsit
Автор

at 7:31your slide shows a threshold for the activation function in the equation but the animation shows a sigmoid for the activation. That might confuse some MIT folks.

prestn