Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Показать описание

Here we describe Q-learning, which is one of the most popular methods in reinforcement learning. Q-learning is a type of temporal difference learning. We discuss other TD algorithms, such as SARSA, and connections to biological learning through dopamine. Q-learning is also one of the most common frameworks for deep reinforcement learning.

This is a lecture in a series on reinforcement learning, following the new Chapter 11 from the 2nd edition of our book "Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control" by Brunton and Kutz

This video was produced at the University of Washington

Steve Brunton

Рекомендации по теме

Комментарии

I personally love the big picture perspective that Prof. Brunton always shows. Please, continue to make these high quality videos!

thiagocesarlousadamarsola

Professor I must sincerely thank you for the astonishing quality of this video. You were able to clearly explain an advanced concept without simplifying, going into the details and providing brilliant insights. Also I sincerely thank you for saving my GPA from my R.L. exam 😆

davidelicalsi

CS PhD student here. This video provides such amazing content. Highly recommended.

TheSinashah

Prof. Burton, you are amazing. I never expected someone to take so much of time to explain a concept about TD. I'm one of the few people who hate reading text books to understand concepts. I rather see a video or learn about it from class.
Thanks a lot

jashwantraj

Thank you for the outstanding production quality and content of these lectures! I especially enjoy the structure diagram organizing the different RL methods.

usonian

More casual example for TD-learning:

Imagine a curious robot exploring a maze, searching for a hidden treasure. Unlike other methods that wait until it finds the treasure to learn, TD learning is all about learning on the fly. It uses what it already knows (like the estimated value of different paths) and immediate feedback (rewards) to improve its predictions about future moves.

- The robot keeps track of a Q-value (Q(s_t, a_t)) for each path, which tells it how good it thinks that path is based on its past experiences.
- When it takes a path and gets a Q-value (or reward) (like finding a clue), it compares that reward to what it expected (based on the Q-value - r_t + \gamma Q(s_{t + 1}, a_{t + 1})). This difference is called the prediction error.
- If the reward is better than expected (positive error, or r_t + \gamma V^{old}(s_{t + 1}) - V^{old}(s_t) > 0), the robot increases the Q-value for that path, making it seem more attractive next time.
- If the reward is worse than expected (negative error, or r_t + \gamma V^{old}(s_{t + 1}) - V^{old}(s_t) < 0), the robot decreases the Q-value, steering it away from less promising paths.

anlehoang

I was hoping that your next video would have been about Q-learning, and here it comes!

kalimantros

This is the best RL tutorial on the internet.

TheFitsome

Thank you dear Prof Brunton for this outstanding lecture. The detailed explanations and focus on subtleties are so important, Looking forward to your next videos.

OmerBoehm

I enjoy your talks. They are very clear and well structured and have the right level of detail. Thank you,

imanmossavat

Thank you! It's a great video. My understanding in TD learning was deepened a lot.

haotianhang

I do like the description of Q Learning. I had come up with another analogy for why it makes sense. If you took the action of going out to a party, and then happened to make some mistakes while there, we wouldn't want to say "you should never go out again." We'd want to reinforce the action of going out based on the best* possible outcome of that night, not the suboptimal action that was taken once there.

complexobjects

Thank you so much for using very relevant analogies and very clear explanations. I think I have a much better grasp of the concepts behind Temporal Difference learning now.

areebayubi

These are fantastic lectures, I use these as an alternative explaination to David Silvers DeepmindxUCL 2015 lectures on the same topic, the different perspective really suits how my brain understands RL. Thank you!!

marzs.szzzzz

this was the best explanation ever!
thank you so much, professor!

BoltzmannVoid

Excellent class! Extremely easy to understand!

cruise

Somehow I find that the explanations given by Prof. Brunton are easier to understand than those provided by video lectures from Stanford (which are also available on Youtube).

antimon

I'm so much very grateful for these videos you make. Keep on the good work.

nnanyereugoemmanuel

Thanks a lot. Not just math but also the intuition that i was looking for

prateekcaire

The video quality is incredible lol and all the concept is discussed extremely clear OMG!!
Brilliant masterpiece bro KEEP GOING !!

denchen

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Q Learning Explained (tutorial)

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Tutorial: Q Learning, Model Free Learning and more! (With Python Code!)

Q Learning simply explained | SARSA and Q-Learning Explanation

Q-learning - Explained!

Q-Learning Explained - Reinforcement Learning Tutorial

OpenAI's Q*?: Reinforcement Learning, Model-Based vs. Model-Free Methods, and Q-Learning

Reinforcement Learning Basics

RL Course by David Silver - Lecture 5: Model Free Control

DeepRL1.6 Model based versus Model free Reinforcement Learning Source

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

AI Learns to Walk (deep reinforcement learning)

Q Learning | Building a Crawling Robot with Reinforcement Learning

Q Learning Intro/Table - Reinforcement Learning p.1

Maze Solver using Q Learning (Reinforcement Learning)

#1. Q Learning Algorithm Solved Example | Reinforcement Learning | Machine Learning by Mahesh Huddar

Reinforcement Learning Course - Full Machine Learning Tutorial

Overview of Deep Reinforcement Learning Methods

Model-Based and Model-Free Algorithms in Reinforcement Learning

Python + PyTorch + Pygame Reinforcement Learning – Train an AI to Play Snake

Q Learning In Reinforcement Learning | Q Learning Example | Machine Learning Tutorial | Simplilearn

Q Learning for Trading