What Is Q*? The Leaked AGI BREAKTHROUGH That Almost Killed OpenAI

preview_player
Показать описание
Update: ROY-ders 🤣

In this video, I break down every piece of information we have about Q*, the revolutionary AGI breakthrough that has been leaked from OpenAI. Everyone in the AI community has been scrambling to figure out what it is, and I’ve collected everything I can on the subject. So, what is Q*? Is it AGI?

Enjoy 🙂

Join My Newsletter for Regular AI Updates 👇🏼

Need AI Consulting? ✅

Rent a GPU (MassedCompute) 🚀
USE CODE "MatthewBerman" for 50% discount

My Links 🔗

Media/Sponsorship Inquiries 📈

Links:

Рекомендации по теме
Комментарии
Автор

So...is Q* the ingredient for AGI? What do you think?

matthew_berman
Автор

The notation "Q*" is often used in the context of reinforcement learning, a subfield of artificial intelligence. In reinforcement learning, the Q-value represents the expected cumulative reward of taking a particular action in a specific state and following a certain policy.
The Q* specifically refers to the optimal Q-value, which represents the maximum expected cumulative reward achievable by following the optimal policy. The optimal policy is the strategy that maximizes the expected cumulative reward over time.
Mathematically, for a given state 's' and action 'a' he optimal Q-value is denoted as: Q*(s, a). The optimal Q-value satisfies the Bellman optimality equation, which is a fundamental equation in reinforcement learning.

In summary, when you see Q* in the context of AI and reinforcement learning, it generally refers to the optimal Q-value, representing the maximum expected cumulative reward for taking a specific action in a given state while following the optimal policy.

Think of it this way...

Imagine you're playing a video game, and you're trying to figure out the best way to make your character score the highest points. In AI, especially in a part called reinforcement learning, we use something called "Q-values" to help the computer learn the best moves. Each action you can take in the game has a Q-value, which is like a score. The higher the Q-value, the better that action is expected to be. Q* just means the absolute best score, like the highest possible score you could get for a specific action in a certain situation.

So, when people talk about Q*, they're basically saying, "Hey, let's figure out the best way to play the game and get the most points possible." It's a way for computers to learn and make smart decisions in different situations.

ElioRose
Автор

"how quickly do you want to destroy humanity?"


"yes"

JustMyTwoCentz
Автор

This is great content, I really enjoy the comfortably unassuming pace you take to navigate the content while hitting the raw published papers and communications from subject matter experts that are respected by the community, all with relevant and interesting insights, palpable enthusiasm, respectful explanations for watchers who are out of the loop, and you are prepared prior to turning the camera on. Subscribed.

chaseoneill
Автор

Your most wide ranging, far reaching, ambitious video yet; - by an order of magnitude. A great round up of the state of the art from professional and amateur alike. Thank you for not under estimating your audience. Bravo.

jpandrews
Автор

This smells a lot like a marketing strategy...

AlitaNapol
Автор

It's a reinforcement learning technique, also called Q learning...LLMs just predict the next token, and they cannot plan. So Q learning gives them the ability to do that in a way. I guess they found a way to plan and verify "step by step" or "inner monologue" so that they can approach correct answer which gives them the ability to do math without failing.

foreignconta
Автор

Reuters is pronounced Roy'-ters not Roo'-ters.

genebeidl
Автор

There's a lot of detective work going on in order to prepare this video... thanks for spending the time and effort. Great stuff.

marcfruchtman
Автор

Oh boy, this is when reality is starting to surpass science fiction 😯

cristian
Автор

"Reuters" is spelled Roy-ters. It comes from German language, like Euler - is pronounced Oyler.

Ownedyou
Автор

In the context of reinforcement learning, Q* refers to the optimal action-value function, which gives the maximum expected reward for an action taken in a given state, considering an optimal policy.

wtpqqcq
Автор

00:03 Q* is the AI breakthrough that almost killed OpenAI
02:08 OpenAI almost shut down due to fear of a dangerous AI discovery
06:16 OpenAI is integrating self-learning techniques into a large language model.
08:21 The paper discusses generating step-by-step rationales to improve language model performance on complex reasoning tasks.
12:26 Q* is a breakthrough in AGI with implications for process supervision and active learning.
14:33 Language models can benefit from self-play and look ahead planning.
18:28 Understanding mathematical reasoning and solving mathematical proofs can have a significant impact on various aspects of the world.
20:23 A proof that P equals MP could have unexpected consequences and implications for computational complexity.
24:08 Q* is OpenAI's attempt at planning.
25:59 Self-improvement is a key aspect of AGI
29:30 The main challenge in open language modeling is the lack of a reward criterion, making self-improvement difficult.
31:11 Large language models can be improved by self-play and incorporating agent feedback.
34:50 Q* is a potentially groundbreaking AI system that scares people.

Codescord
Автор

🎯 Key Takeaways for quick navigation:

00:00 🤖 *Introduction to Q* and the controversy around it*
- Q* is a mysterious AI breakthrough that led to controversy within OpenAI.
- A letter of concern was written by staff researchers about Q*, contributing to the firing of Sam Altman.
02:06 🛡️ *Concerns and the board's reaction*
- The OpenAI board was deeply concerned about the discovery of Q*.
- They were willing to shut down the company to prevent its release due to safety concerns.
03:44 💼 *Speculation about Q* and its potential breakthroughs*
- Q* may involve advanced mathematical reasoning and solving complex problems.
- The use of process reward models (PRMs) and self-play in AI development.
- The significance of synthetic data in expanding data sets.
19:56 🌐 *Implications of Q* for the world*
- The potential consequences of AI understanding mathematical proofs.
- The impact on encryption, physics, chemistry, and various fields.
- Hypothetical scenarios related to P vs. NP and their consequences.

Made with HARPA AI

HoustonTyme
Автор

first off, that "leaked letter from openai" is likely BS. I highly doubt Sam would have been dumb enough to accelerate launching this, knowing the consequences it could have.

Second, synthetic data is definitely the future. My hypothesis is that they'll have it grow its own knowledge database by using a physics engine or the real world as a sandbox to explore its understanding and subsequently predict outcomes. That's the only way I see AGI becoming reality.

Quantum_Nebula
Автор

You are thorough and clear in your presentation of this complex information. Thank you for all this work to spread this important knowledge. It is tremendously helpful to people like me who are interested, but lack technical background.

LoisSharbel
Автор

It’s unlikely that even the most advanced AI will break encryption as you’ve described. Most cryptographers and computer scientists believe that certain mathematical problems are intrinsically intractable and not solvable in a reasonable time frame. However, until it is proven that P =\= NP this is still an open question. Perhaps AI will help resolve the P vs. NP problem. In the weird case that AI proves P == NP, your original suggestion which I discounted will turn out to be true!

zgrunschlag
Автор

As for encryption, I am happy to inform you that most big data centers security including cloud-based security and Firewalls, use much higher encryption than AES-128 (which is actually kind of old). On top of that, you can double encrypt and had HASH for integrity.
Example:
You first encrypt the data using AES-256 to produce an encrypted message.
Then, you calculate the SHA-256 hash of the encrypted message to create a unique "fingerprint" for the encrypted data.
Finally, you attach the hash value to the encrypted message as a sort of digital "signature" or checksum, to verify the integrity of the encrypted message when it's decrypted.
By using both AES-256 and SHA-256, you're essentially "double-locking" the data, making it even more secure and difficult to tamper with.


So EVEN if Q-Star could break AES-128, it will not break the internet but it is concerning. I thought I would let you know this so you can sleep better. 🙂

senju
Автор

This is one of the best videos I've seen on this subject. It's clear that you put a lot of time and effort into making such a high-quality video. Thank you!

bradleypout
Автор

Great video. Just an FYI Reuters is pronounced Royters. I have made the same mispronunciation many times myself

rickrichman