OpenAI o1's New Paradigm: Test-Time Compute Explained

Показать описание

What is the latest hype about Test-Time Compute and why it's mid

Check out NVIDIA's suite of Training and Certification here:
You can use the code “BYCLOUD” at checkout for 10% off!

check out my newsletter:

Test Time Compute by DeepMind

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Language Models Learn to Mislead Humans via RLHF

Chain-of-Thought Reasoning Without Prompting

Larger and more instructable language models become less reliable

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Ben Shaener, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam, Kadhai Pesalam, Tim Schulz

[Music] massobeats - floral
[Video Editor] @Askejm

Рекомендации по теме

Комментарии

Let me know if you guys want a dive into the methodologies of TTC, there's a lot of new papers/implementations coming out every day lol (entropix is a cool one)

Check out NVIDIA's suite of Training and Certification here:
You can use the code “BYCLOUD” at checkout for 10% off!

bycloudAI

OpenAI went from extremely secretive closed-source for profit to even more secretive closed-source for profit. Truly revolutionary change.

lbgstzockt

One of the chain of thoughts felt like doing an A* search on all possible answers

Guedez

I don't understand why you're so insistent that using RL to learn reasoning can't cause new knowledge to be gained. You're implicitly assuming that if the model knows A and that A implies B then the model must already know B. But that's not true. The model knows the rules of chess, and these rules imply whatever the optimal strategy is, but it definitely doesn't know this optimal strategy. It may come to learn it (or of approximations of it) through RL, though, as alpha zero and similar did.

XetXetable

Your channel is like twitter but only the good part, I love it

rawallon

Glad to see the original editing approach back.

Terenfear

Fun fact: I have spent 3-4 days trying to fix a single SQLite bug while I was debugging with AI

BloomDevelop

RLHF or in other words LGTM ship it to prod.

shApYT

kinda reminds me of how chess bots like stockfish are able to view multiple potential outcomes to find the best move possible

GIRcode

Thank you for giving us a healthy level of scepticism in the current AI models.

vincent_hall

a) Subscribed after 1 minute;
b) I really like this almost perfect amount of quick things on the screen that I can actually understand and have (so little) time to get! Wow;
c) The jokes are good. It gets smiled me at least 5+ times.

beautifulcursecalledmusic

so basically they found out that giving the layman a bit more time to solve an easier problem can be more cost effective thst giving the smart guy a menial task, and it is also worth giving the smart guy more time to train to more effectively solve harder problems...

havent we already known this for hundreds if not thousands of years?

AidanNaut

"Bart say the line!"

*Sigh* "The bitter lesson strikes again"

John_YT

I just hope this kick starts inference backends like ollama, kobold, ooba, tabby or any other into having native support for any test-time compute approaches. Would be nice to query some fast small model like a 12B Mistral and get it to take longer but think through a better answer.

..

Okay this explains why higher temp and top_p give better results sometime😮

Originalimoc

So in my current system that uses LLM. After watching this video, I added a setTimeout that changes a bool to true after 8 seconds, and a while loop that runs inference over and over for a "thought" given the current environment state while the bool is false. so it's thinking for about 8 seconds and spits out about 4 'thoughts' in that time. After stuffing my speaker agent's context with those thoughts generated in 8 seconds it really does improve the quality of the final output. I'm just curious, did anyone catch how they calculate how long to "think" for?

tvwithtiffani

Do the studies that compare 01 vs gpt4 utilize a chain of thought prompt for the latter because if not the discrepancy in performance seems arbitrary.

johnmcclane

Totally agree, mid. Deep mind already did the most on this

PieroSavastano

Thanks! Very interesting about eng not improving.

Hmework

Also what is interesting about silly things like counting the amount of r in strawberry, it can easily be done if you instead start the AI with something more solid to work with, such as telling it to use the code interpreter/generation capabilities. which means 4o right now can technically count r better than o1 because it can run simple python code. This is the difference between running a nondeterministic model vs asking it to instead leverage a tool specifically made to be completely deterministic. 4o being able to use something like code generation and interpreter is more massive use than o1 can do with its limited capabilities. instead, openai will need to implement tools for o1 to interact with that can give more solid deterministic outcomes. so that when o1 does the chain of thought, it can simply think, hey I am unsure let me query a tool that can output something reliably or touch on a verifiable database of information.

acters

OpenAI o1's New Paradigm: Test-Time Compute Explained

OpenAI o1's New Paradigm: Test-Time Compute Explained

OpenAI’s new “deep-thinking” o1 model crushes coding benchmarks

OpenAI's New AI GPT-o1 STUNS The ENTIRE INDUSTRY Surprises Everyone! (STRAWBERRY RELEASED!)

This New AI Model Is Genius - DESTROYS OpenAI o1 in REASONING

So Google's Research Just Exposed OpenAI's Secrets (OpenAI o1-Exposed)

o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know

ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview)

OpenAI Newest GPT-o1 Shocks AI Industry With 6 Next Gen Abilities (GPT5 STRAWBERRY?)

Open Source 'Thinking' Models Are Catching Up To OpenAI o1 Already...

OpenAI-o1 Explained: Inside the AI Model Outperforming PhD Experts

Explaining OpenAI's o1 Reasoning Models

12 INSANE Use Cases for NEW ChatGPT o1 Model! (The BEST LLM)

OpenAI Finally Admits It 'We've Achieved AGI'

OpenAI's o1 Smashes Benchmarks: What It Is And How To Use It

NEW: OpenAI o1 Strawberry Is INSANE!🍓🤖 PHD Reasoning, New AI Coding & Business Paradigm Deep Div...

OpenAI o1: The Coding Breakthrough You Need to See!

OpenAI Releases GPT Strawberry 🍓 Intelligence Explosion!

OpenAI o1: The Next Leap in AI Reasoning

OpenAI's New Reasoning Model, o1 Strawberry: Is This AGI? Full Breakdown

OpenAI Finally Introduced PhD Level AI o1 Model | Shocking AI Revealed

The World Reacts to OpenAI's Unveiling of o3!

OpenAI o1 - the biggest black box of all. Let’s break it open.

THESE prompts make NEW ChatGPT o1 UNSTOPPABLE! (Insane Results)

OPEN AI RELEASES STRAWBERRY(o1) MODEL. All the Details you need to know.