o3 Model by OpenAI TESTED ($1800+ per task)

preview_player
Показать описание
New o3 model tested. Detailed performance test data analyzed. First interpretation of test time compute performance and new architecture complexity of o3.

First performance results from the new o3 model by OpenAI, published less than 10 hours ago.
Hint: o3 is not AGI.

Correction: The x axis of the costs per task diagram is log, therefore the costs for the o3 model per task might not be linear, but increase log scale. Smile.

@OpenAI
Early access for safety testing

#openai
#newtechnology
#o3
#education
#reasoning
#aiagents
Рекомендации по теме
Комментарии
Автор

As noted by a viewer, the x-axis representing the costs per task for O3 is logarithmic, so it will be .... slightly more expensive. Smile.

codeAI
Автор

Your calculation of the cost is wrong as the x axis is a logarithmic scale and the next tick after 1k is 10k, not 2k. Hence the cost of a task of o3 high seems more like 7k-9k

rijmsszfbsrewtwo
Автор

That MIT paper for TTT is probably the key to next level algorithmic unlock like TTC.

SimonNgai-du
Автор

I suppose most important that we have already this models and no matter cost per task, because cost mostly based on gpu\watt-hour
but let`s replace by photonic chips and it's become almost zero cost

synthbrain
Автор

Energy input will decrease substantially by the use of photonic computing (which is analog). Linear algebra (the crucial part) can already be done, but it is in its early stages. I think this (instead of quantum computing) will be the next revolution in computing. A suitable photonic element can do e.g. a two dimensional Fourier transform on as many pixels as fit the sensor in one single pass.

winstongludovatz
Автор

Remember that ARC AGI also represents a second challenge which is representing spatial visual data in sequential tokens. Which by itself is quite a monumental task to overcome.

ptkjuwb
Автор

This is very impressive. As for the cost, this will fall. Someone will figure out what it is about the model that does reasoning, and then they will build a <10b model that does this. What is done by brute force initially is done by architecture and clever training later on.

mrpocock
Автор

Beste Grüße aus Deutschland 👋.

Deine aktuellen Videos finden die richtige Balance zwischen technischer Tiefe und Verständlichkeit. Immer wenn ich nur Bahnhof verstehe, schaue ich mir eins deiner Video zu dem Thema an und bin anschließend wenigsten ein Stückchen schlauer.

Immer am Ball bleiben und weiter so 👍

Pasko
Автор

Ok here is something I don't understand the cost now is $1800 but how much will the same performance cost this time next in Dec 2025? I would think this is the highest the price point is going to get. When people started making computers in the 40s, 50s, and 60s they never could have imagined the 70s, 80s, and 90s. The idea that an average person could own a computer or that price would be so cheap that medium and small businesses could invest in desktops and laptops. This also doesn't include the 2000s, 10s, and 20s. Someone will find a way to bring the price down because the profit motive will push Open AI or someone else to push O3 down to O1 Prices.

CMDRScotty
Автор

My hot take is that this methodology is correct, but it is early. The underlying LLM foundational model needs to be more intelligent to use more/better "intuition" to limit the test time compute scenarios; however, test time compute chain of thought will be optimized + foundational model intelligence will be optimized and pairing the two together will = quality go up and price go down rapidly. It's important to note that this announcement was about PR and staying ahead of Google...

P.SeudoNym
Автор

o3high: The scale is logarithmic, ending at $10k, but the graph does not start at 0 as it does not end at $10k. After tokens it comes out $3440 per task. From the chart $3500-3600 per task.

hipotures
Автор

That's why you don't use 1 model to do everything. For many things you will not need o3 to solve everything. You can pass it the output to another model for further processing.

jksoftware
Автор

The first ARC example is ambiguous and this is the reason o3 failed because it produced only one possible outcome but different one was expected.

szef_fabryki_azbestu
Автор

Do you think o3 is tree of thought? That could explain its large jump in accuracy but also much larger computational cost at the high end (many more branches). Seems like the natural progression of o1 within a few months.

ricosrealm
Автор

Three examples is not enough. And that is actually logical. We impute special significance to the geometric shapes from our experience, which the thing does not have. It has to make up for it via a much larger number of examples. And in real life applications it is actually able to do that, many more examples than a human could.

winstongludovatz
Автор

Seems like the models are just using brute force, doesn't seem like they are really thinking or reasoning like a human

mdkk
Автор

Just months ago Chollet had a different threshold for AGI. Now he moves his goalposts to keep his benchmark relevant, and his views.

pensiveintrovert
Автор

STEM Grad: 10$ @ ~99% efficiency.

And what were the companies talking about replacing employees again

CielMC
Автор

Well, if it means that alignment goes out of the window - its amazing news.

(also, effectively doing an NP-full search on every task is not AGI by any stretch)

Nworthholf
Автор

hey bro how to fine reasoning model like QWQ etc

shubhamverma