What is Q* | Reinforcement learning 101 & Hypothesis

preview_player
Показать описание
🔗 Links

👋🏻 About Me

#chatgpt #gpt4 #gpt5 #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #agent #reinforcementlearning
Рекомендации по теме
Комментарии
Автор

Anything else I missed about Q*? Leave comment & let me know!

AIJasonZ
Автор

That AlphaGo documentary remains so good, even a few years later. They found the human empathy and passion in a cold technical challenge, all without any narration. It gets me excited about hard tech.

TheDessertFaux
Автор

Really well put together, Jason, with use of interviews and clips.

LukePuplett
Автор

Great overview! Jason, your videos on the AI topic are the best!

00:00 🤖 *"Q Star" is generating a lot of discussion in the AI community, and it's associated with OpenAI's recent actions, but its exact nature remains speculative.*
01:08 🎮 *Reinforcement learning is a machine learning framework where an agent learns from trial and error, aiming to maximize future rewards. It involves policy networks and value networks.*
03:25 🧠 *Reinforcement learning allows AI agents to self-play and discover new strategies, as demonstrated by DeepMind's achievements in games like Breakout and AlphaGo.*
08:01 📚 *There's speculation that "Q Star" could involve using policy networks and value networks, similar to AlphaGo, to improve reasoning and logic in large language models like GPT.*
11:14 🐍 *You can experiment with reinforcement learning in simple games with open-source projects, even if you're new to the field.*

HarpaAI
Автор

Thanks for organising the insights! this hypothesis is very exciting

jasonfinance
Автор

A clear dfinition of AGI has been difficult to find. Temporarily constraining it to a specific field for evaluation might be helpful. For instance, AGI was achieved in Chess and Go when the best humans could not beat the game programs. At a certain point, the number of fields in which AGI has been achieved will far outweigh the fields that it hasn't. When that happens, the "General" in AGI will have been attained.

picksalot
Автор

A lot of people are saying that Q* is some product of A* and Q-learning, but I think that mathematically inclined scientists are using a more formal application of _* than this. I would guess it is a generalization of Q-learning, in the same way that A* is a generalization of 'A': Dijkstra's algorithm. Maybe it involves graph search, but that is probably coincidental to the name. Pretty much everything these days involves graph search.

homelessrobot
Автор

I think Q* must be OPEN SOURCE for benefit humanity. Not only for big companies.

jayhu
Автор

Q* is the optimal route in Q-learning.

BooleanDisorder
Автор

if it is just an optimization of training, I don;t see how it unlocks the abstract thinking. If it is actually another multi-model approach, bandwidth will be a limiting factor. But I think your guesses are not far off, OpenAI focuses on training more than anything else from the start. That is how you make your product look like a breakthrough without an actual breakthrough.

utkua
Автор

Very well organized and informative presentation.

nickstaresinic
Автор

I was waiting for your video Jason! Thanks! Have you done any monte-carlo or genetic algorithm? My quess is Q* is a similar process but done at inference or a precache inference

agenticmark
Автор

Dr Jim is the shit. I will read anything his name is on.

agenticmark
Автор

Q is question and * is repeat, so make sintezis of lot answer you got general inteligent ansver. My noob opinion.

csabaczcsomps
Автор

hey, Can you please make a video on detection on some significant insight using the reinforcement learning.
I was curious about making the model to learn itself about the irregular patterns that needs to be classified using the reinforcement learning

infinite_interestzz
Автор

How does the reward system work for reinforcing behavior beyond Pavlovian bell sounds that signal approval?

Laurie-egct
Автор

in science, technology and engineering world,
there is no such thing (physical entity, or just an idea) have two or more names.
each one, is only represented by single name.
but not in marketing world, plenty of things have identical names or have several names.

Q* and RLHF is something in the science worlds,
so it must be pointing and representing different idea.

IMO

dwikristianto
Автор

Q is blowing up and its not even monday on cable....

andrewcampbell
Автор

Still don't understand Q* although I am a little clearer about AlphaGo

BR-hiyt
Автор

can someone explain why didn't Google figure this out despite of developing some many groundbreaking research in the last decade?

MuzhiLi