What is Q-Learning (back to basics)

preview_player
Показать описание
#qlearning #qstar #rlhf

What is Q-Learning and how does it work? A brief tour through the background of Q-Learning, Markov Decision Processes, Deep Q-Networks, and other basics necessary to understand Q* ;)

OUTLINE:
0:00 - Introduction
2:00 - Reinforcement Learning
7:00 - Q-Functions
19:00 - The Bellman Equation
26:00 - How to learn the Q-Function?
38:00 - Deep Q-Learning
42:30 - Summary

Links:

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
Рекомендации по теме
Комментарии
Автор

this is how you leverage the hype like a true gentleman 😎

raunaquepatra
Автор

I have no time for the hype, but I have all the time in the world for a classic Yannic Kilcher paper explanation video

EdFormer
Автор

Thanks for such a solid fundamental introduction to Q-learning especially in a time many are really excited about Q-star, but few seem to try understanding its basic principles.

changtimwu
Автор

Thank you Yannic your style of surfing the hype is the best!!!

Alilinpow
Автор

Love these paper videos, the reason I subscribed to the channel :)

qwertywifi
Автор

This was very informative. Thank you so much for sharing.

K.F-R
Автор

I would be very interested in seeing a series of paper/concept reviews such as this focusing on the state of the art in RL

OrdniformicRhetoric
Автор

thanks for posting this; good to see some real content

travisporco
Автор

Another awesome video from you Yannic! Gold material on this channel.

agenticmark
Автор

I need someone to upload the Q function to my brain so my life choices start making sense

ProblematicBitch
Автор

This is great. You’re a true wizard in explaining Q, and I love the anonymous look with the sunglasses. You’re a regular Q-anon shaman.

nickd
Автор

I will make sure to stay hydrated, thank you

drdca
Автор

I did deep q learning for my cs bachelors thesis way back. Thank you so much for reminding me of those memories.

ceezar
Автор

18:00 In chess terms, 'Reason 1' can be likened to: 1) Choosing a1 means you won't capture any of your opponent's pieces. 2) Opting for a2 allows you to swiftly capture a substantial piece.

vorushin
Автор

By far the most effective way of learning. Hacking at the essence, in a chain of thought manner.

matskjr
Автор

A good example for what you were talking about just before the bellman eq, would be that Move B(10 reward) will help take a chess piece in the future. Where as Move A, will result in moving away from that reality, or even maybe having the piece be taken by the opponent, making the 'next move' the 'policy' would want, not be possible.

draken
Автор

I realize that I read this paper ten years ago. Now I'm ten years older omg.

tglvhvr
Автор

Old paper review - yeh! we missed that.

drhilm
Автор

My dude, that point you mention at 45:05, right at the end, about having state and actions being the input is exactly the question I've been trying to find an answer to. To see and hear it mentioned twice but each time you said you're not going to talk about it felt like knife in heart. If you don't do a video on it, do you have papers that talk through how this has been done? Great stuff either way, able to learn a bunch.

jackschultz
Автор

During your explanation it comes to my mind the Dijkstra's Algorithm. They say that this Q* can increase the processing needs some 1000 times. You check all the paths in your graph and choose the ideal one.

EnricoGolfettoMasella