Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

preview_player
Показать описание
Here we describe Q-learning, which is one of the most popular methods in reinforcement learning. Q-learning is a type of temporal difference learning. We discuss other TD algorithms, such as SARSA, and connections to biological learning through dopamine. Q-learning is also one of the most common frameworks for deep reinforcement learning.


This is a lecture in a series on reinforcement learning, following the new Chapter 11 from the 2nd edition of our book "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz

This video was produced at the University of Washington
Рекомендации по теме
Комментарии
Автор

I personally love the big picture perspective that Prof. Brunton always shows. Please, continue to make these high quality videos!

thiagocesarlousadamarsola
Автор

Professor I must sincerely thank you for the astonishing quality of this video. You were able to clearly explain an advanced concept without simplifying, going into the details and providing brilliant insights. Also I sincerely thank you for saving my GPA from my R.L. exam 😆

davidelicalsi
Автор

CS PhD student here. This video provides such amazing content. Highly recommended.

TheSinashah
Автор

Prof. Burton, you are amazing. I never expected someone to take so much of time to explain a concept about TD. I'm one of the few people who hate reading text books to understand concepts. I rather see a video or learn about it from class.
Thanks a lot

jashwantraj
Автор

Thank you for the outstanding production quality and content of these lectures! I especially enjoy the structure diagram organizing the different RL methods.

usonian
Автор

More casual example for TD-learning:

Imagine a curious robot exploring a maze, searching for a hidden treasure. Unlike other methods that wait until it finds the treasure to learn, TD learning is all about learning on the fly. It uses what it already knows (like the estimated value of different paths) and immediate feedback (rewards) to improve its predictions about future moves.

- The robot keeps track of a Q-value (Q(s_t, a_t)) for each path, which tells it how good it thinks that path is based on its past experiences.
- When it takes a path and gets a Q-value (or reward) (like finding a clue), it compares that reward to what it expected (based on the Q-value - r_t + \gamma Q(s_{t + 1}, a_{t + 1})). This difference is called the prediction error.
- If the reward is better than expected (positive error, or r_t + \gamma V^{old}(s_{t + 1}) - V^{old}(s_t) > 0), the robot increases the Q-value for that path, making it seem more attractive next time.
- If the reward is worse than expected (negative error, or r_t + \gamma V^{old}(s_{t + 1}) - V^{old}(s_t) < 0), the robot decreases the Q-value, steering it away from less promising paths.

anlehoang
Автор

I was hoping that your next video would have been about Q-learning, and here it comes!

kalimantros
Автор

This is the best RL tutorial on the internet.

TheFitsome
Автор

Thank you dear Prof Brunton for this outstanding lecture. The detailed explanations and focus on subtleties are so important, Looking forward to your next videos.

OmerBoehm
Автор

I enjoy your talks. They are very clear and well structured and have the right level of detail. Thank you,

imanmossavat
Автор

Thank you! It's a great video. My understanding in TD learning was deepened a lot.

haotianhang
Автор

I do like the description of Q Learning. I had come up with another analogy for why it makes sense. If you took the action of going out to a party, and then happened to make some mistakes while there, we wouldn't want to say "you should never go out again." We'd want to reinforce the action of going out based on the best* possible outcome of that night, not the suboptimal action that was taken once there.

complexobjects
Автор

Thank you so much for using very relevant analogies and very clear explanations. I think I have a much better grasp of the concepts behind Temporal Difference learning now.

areebayubi
Автор

These are fantastic lectures, I use these as an alternative explaination to David Silvers DeepmindxUCL 2015 lectures on the same topic, the different perspective really suits how my brain understands RL. Thank you!!

marzs.szzzzz
Автор

this was the best explanation ever!
thank you so much, professor!

BoltzmannVoid
Автор

Excellent class! Extremely easy to understand!

cruise
Автор

Somehow I find that the explanations given by Prof. Brunton are easier to understand than those provided by video lectures from Stanford (which are also available on Youtube).

antimon
Автор

I'm so much very grateful for these videos you make. Keep on the good work.

nnanyereugoemmanuel
Автор

Thanks a lot. Not just math but also the intuition that i was looking for

prateekcaire
Автор

The video quality is incredible lol and all the concept is discussed extremely clear OMG!!
Brilliant masterpiece bro KEEP GOING !!

denchen