What is Q* | Reinforcement learning 101 & Hypothesis

Показать описание

🔗 Links

👋🏻 About Me

#chatgpt #gpt4 #gpt5 #ai #artificialintelligence #tutorial #stepbystep #openai #llm #chatgpt #largelanguagemodels #largelanguagemodel #agent #reinforcementlearning

Рекомендации по теме

Комментарии

Anything else I missed about Q*? Leave comment & let me know!

AIJasonZ

That AlphaGo documentary remains so good, even a few years later. They found the human empathy and passion in a cold technical challenge, all without any narration. It gets me excited about hard tech.

TheDessertFaux

Really well put together, Jason, with use of interviews and clips.

LukePuplett

Great overview! Jason, your videos on the AI topic are the best!

00:00 🤖 *"Q Star" is generating a lot of discussion in the AI community, and it's associated with OpenAI's recent actions, but its exact nature remains speculative.*
01:08 🎮 *Reinforcement learning is a machine learning framework where an agent learns from trial and error, aiming to maximize future rewards. It involves policy networks and value networks.*
03:25 🧠 *Reinforcement learning allows AI agents to self-play and discover new strategies, as demonstrated by DeepMind's achievements in games like Breakout and AlphaGo.*
08:01 📚 *There's speculation that "Q Star" could involve using policy networks and value networks, similar to AlphaGo, to improve reasoning and logic in large language models like GPT.*
11:14 🐍 *You can experiment with reinforcement learning in simple games with open-source projects, even if you're new to the field.*

HarpaAI

Thanks for organising the insights! this hypothesis is very exciting

jasonfinance

A clear dfinition of AGI has been difficult to find. Temporarily constraining it to a specific field for evaluation might be helpful. For instance, AGI was achieved in Chess and Go when the best humans could not beat the game programs. At a certain point, the number of fields in which AGI has been achieved will far outweigh the fields that it hasn't. When that happens, the "General" in AGI will have been attained.

picksalot

A lot of people are saying that Q* is some product of A* and Q-learning, but I think that mathematically inclined scientists are using a more formal application of _* than this. I would guess it is a generalization of Q-learning, in the same way that A* is a generalization of 'A': Dijkstra's algorithm. Maybe it involves graph search, but that is probably coincidental to the name. Pretty much everything these days involves graph search.

homelessrobot

I think Q* must be OPEN SOURCE for benefit humanity. Not only for big companies.

jayhu

Q* is the optimal route in Q-learning.

BooleanDisorder

if it is just an optimization of training, I don;t see how it unlocks the abstract thinking. If it is actually another multi-model approach, bandwidth will be a limiting factor. But I think your guesses are not far off, OpenAI focuses on training more than anything else from the start. That is how you make your product look like a breakthrough without an actual breakthrough.

utkua

Very well organized and informative presentation.

nickstaresinic

I was waiting for your video Jason! Thanks! Have you done any monte-carlo or genetic algorithm? My quess is Q* is a similar process but done at inference or a precache inference

agenticmark

Dr Jim is the shit. I will read anything his name is on.

agenticmark

Q is question and * is repeat, so make sintezis of lot answer you got general inteligent ansver. My noob opinion.

csabaczcsomps

hey, Can you please make a video on detection on some significant insight using the reinforcement learning.
I was curious about making the model to learn itself about the irregular patterns that needs to be classified using the reinforcement learning

infinite_interestzz

How does the reward system work for reinforcing behavior beyond Pavlovian bell sounds that signal approval?

Laurie-egct

in science, technology and engineering world,
there is no such thing (physical entity, or just an idea) have two or more names.
each one, is only represented by single name.
but not in marketing world, plenty of things have identical names or have several names.

Q* and RLHF is something in the science worlds,
so it must be pointing and representing different idea.

IMO

dwikristianto

Q is blowing up and its not even monday on cable....

andrewcampbell

Still don't understand Q* although I am a little clearer about AlphaGo

BR-hiyt

can someone explain why didn't Google figure this out despite of developing some many groundbreaking research in the last decade?

MuzhiLi

What is Q* | Reinforcement learning 101 & Hypothesis

What is Q* | Reinforcement learning 101 & Hypothesis

What is Q

Q-Learning Explained - A Reinforcement Learning Technique

Q Learning Explained (tutorial)

Q-Learning Explained - Reinforcement Learning Tutorial

Reinforcement Learning Basics

Reinforcement Learning: Crash Course AI #9

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Understanding Reinforcement Learning Algorithms: The Progress from Basic Q-learning to P

Q Learning Algorithm | Reinforcement learning | Machine Learning by Dr. Mahesh Huddar

#1. Q Learning Algorithm Solved Example | Reinforcement Learning | Machine Learning by Mahesh Huddar

Deep Q-Learning - Combining Neural Networks and Reinforcement Learning

Q Learning In Reinforcement Learning | Q Learning Example | Machine Learning Tutorial | Simplilearn

Reinforcement Learning from scratch

Overview of Deep Reinforcement Learning Methods

A friendly introduction to deep reinforcement learning, Q-networks and policy gradients

Introduction to Reinforcement Learning | Scope of Reinforcement Learning by Mahesh Huddar

MIT 6.S091: Introduction to Deep Reinforcement Learning (Deep RL)

Machine Learning #65 - Reinforcement Learning #3 - Q-Learning

Robot trains with Q-Learning and an artificial neural network (reinforcement learning)

Reinforcement Learning Made Simple - Q-Values

Deep Q Learning w/ DQN - Reinforcement Learning p.5

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

MIT 6.S191 (2023): Reinforcement Learning