OpenAI o1's New Paradigm: Test-Time Compute Explained

preview_player
Показать описание
What is the latest hype about Test-Time Compute and why it's mid

Check out NVIDIA's suite of Training and Certification here:
You can use the code “BYCLOUD” at checkout for 10% off!

check out my newsletter:

Test Time Compute by DeepMind

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Language Models Learn to Mislead Humans via RLHF

Chain-of-Thought Reasoning Without Prompting

Larger and more instructable language models become less reliable

Let's Think Dot by Dot: Hidden Computation in Transformer Language Models

This video is supported by the kind Patrons & YouTube Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Ben Shaener, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Owen Ingraham, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Penumbraa, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth, Thipok Tham, Clayton Ford, Theo, Handenon, Diego Silva, mayssam, Kadhai Pesalam, Tim Schulz

[Music] massobeats - floral
[Video Editor] @Askejm
Рекомендации по теме
Комментарии
Автор

Let me know if you guys want a dive into the methodologies of TTC, there's a lot of new papers/implementations coming out every day lol (entropix is a cool one)

Check out NVIDIA's suite of Training and Certification here:
You can use the code “BYCLOUD” at checkout for 10% off!

bycloudAI
Автор

OpenAI went from extremely secretive closed-source for profit to even more secretive closed-source for profit. Truly revolutionary change.

lbgstzockt
Автор

One of the chain of thoughts felt like doing an A* search on all possible answers

Guedez
Автор

I don't understand why you're so insistent that using RL to learn reasoning can't cause new knowledge to be gained. You're implicitly assuming that if the model knows A and that A implies B then the model must already know B. But that's not true. The model knows the rules of chess, and these rules imply whatever the optimal strategy is, but it definitely doesn't know this optimal strategy. It may come to learn it (or of approximations of it) through RL, though, as alpha zero and similar did.

XetXetable
Автор

Your channel is like twitter but only the good part, I love it

rawallon
Автор

Glad to see the original editing approach back.

Terenfear
Автор

Fun fact: I have spent 3-4 days trying to fix a single SQLite bug while I was debugging with AI

BloomDevelop
Автор

RLHF or in other words LGTM ship it to prod.

shApYT
Автор

kinda reminds me of how chess bots like stockfish are able to view multiple potential outcomes to find the best move possible

GIRcode
Автор

Thank you for giving us a healthy level of scepticism in the current AI models.

vincent_hall
Автор

a) Subscribed after 1 minute;
b) I really like this almost perfect amount of quick things on the screen that I can actually understand and have (so little) time to get! Wow;
c) The jokes are good. It gets smiled me at least 5+ times.

beautifulcursecalledmusic
Автор

so basically they found out that giving the layman a bit more time to solve an easier problem can be more cost effective thst giving the smart guy a menial task, and it is also worth giving the smart guy more time to train to more effectively solve harder problems...

havent we already known this for hundreds if not thousands of years?

AidanNaut
Автор

"Bart say the line!"

*Sigh* "The bitter lesson strikes again"

John_YT
Автор

I just hope this kick starts inference backends like ollama, kobold, ooba, tabby or any other into having native support for any test-time compute approaches. Would be nice to query some fast small model like a 12B Mistral and get it to take longer but think through a better answer.

..
Автор

Okay this explains why higher temp and top_p give better results sometime😮

Originalimoc
Автор

So in my current system that uses LLM. After watching this video, I added a setTimeout that changes a bool to true after 8 seconds, and a while loop that runs inference over and over for a "thought" given the current environment state while the bool is false. so it's thinking for about 8 seconds and spits out about 4 'thoughts' in that time. After stuffing my speaker agent's context with those thoughts generated in 8 seconds it really does improve the quality of the final output. I'm just curious, did anyone catch how they calculate how long to "think" for?

tvwithtiffani
Автор

Do the studies that compare 01 vs gpt4 utilize a chain of thought prompt for the latter because if not the discrepancy in performance seems arbitrary.

johnmcclane
Автор

Totally agree, mid. Deep mind already did the most on this

PieroSavastano
Автор

Thanks! Very interesting about eng not improving.

Hmework
Автор

Also what is interesting about silly things like counting the amount of r in strawberry, it can easily be done if you instead start the AI with something more solid to work with, such as telling it to use the code interpreter/generation capabilities. which means 4o right now can technically count r better than o1 because it can run simple python code. This is the difference between running a nondeterministic model vs asking it to instead leverage a tool specifically made to be completely deterministic. 4o being able to use something like code generation and interpreter is more massive use than o1 can do with its limited capabilities. instead, openai will need to implement tools for o1 to interact with that can give more solid deterministic outcomes. so that when o1 does the chain of thought, it can simply think, hey I am unsure let me query a tool that can output something reliably or touch on a verifiable database of information.

acters