OpenAI o3 and o3-mini—12 Days of OpenAI: Day 12

Показать описание

Sam Altman, Mark Chen, Hongyu Ren, and special guest Greg Kamradt, President of ARC Prize Foundation, introduce and discuss OpenAI o3, o3-mini, along with a call for safety testing and a new alignment strategy for o-series models.

OpenAI

Рекомендации по теме

Комментарии

I appreciate the way you highlight the people who are working on these breakthroughs.

justinlloyd

Hahaha love how Sam was like "maybe not" when the guys joked about having the model self-improve itself in real-time as a demo. Skynet as a demo would be amazing from Open AI LOL

Swiplys

I witnessed this today, and it feels deeply personal. In the early 90s, I was inspired by the ANN revolution of the 80s (already by Hinton and his friends)—a field shaped by pioneers like Frank Rosenblatt (1950s/60s), a psychologist who laid the groundwork for neural networks. This intersection of Computers and Humans has always been my passion, leading me to pursue degrees in Computer Science. Few realize that some of the greatest minds in AI—like Hinton, whose early studies in Psychology didn’t stop him from earning the 2024 Nobel Prize in Physics—or even Internet pioneers like J.C.R. Licklider and Bob Taylor, had psychology backgrounds. Watching AI evolve to fulfill visions decades in the making, I feel profoundly grateful to witness this pivotal moment in history.

luisluiscunha

Congratulations, OpenAI! If O3 can learn on the fly from new data and solve ARC-AGI, we are very close to an AGI system. Furthermore, if it includes long-context memory, multimodality, spatial reasoning, and agents, it would be a technically robust AGI system. Wishing you the best for 2025!

zaharizarev

Folks, it's been a total Joy watching you nerd out on the great achievements you've had over the past 12 days. You've done some great stuff here. Great job

andrewflewelling

They actually scored 87.5% on ARC-AGI I mean if the results are true they actually have achieved human reasoning .

stevo

Apart from trying it to the very limits, and letting it score high in absurdly difficult problems, I miss a benchmark on reliability on easy tasks.
If we get near 100% on relatively simple tasks, that is were automation could start making a serious economic difference.
I would be more amazed of seeing a 99.9999% for middle school problems rather than 87% on PHD level problems. That would mean we can rely on these models for daily tasks with no supervision. I am looking forward to see those benchmarks. Who knows? We maybe there already.

danielmartinmonge

Here are the key takeaways from the video:

1. OpenAI is announcing two new models - O3 and O3-mini. O3 is a very capable model, outperforming OpenAI's previous O1 model on various benchmarks like coding, math, and the Arc AGI benchmark.

2. O3-mini is a more cost-efficient and performance-focused version of O3, providing similar capabilities at a fraction of the cost and latency compared to O1-mini.

3. OpenAI is opening up early access to O3 and O3-mini for safety and security researchers to help test the models before their public launch, which is planned for around the end of January for O3-mini and shortly after for O3.

4. The video showcases O3-mini's abilities through live demos, including generating code to solve programming problems, evaluating itself on the challenging GPQA Diamond math benchmark, and supporting features like function calling and structured outputs.

5. OpenAI has developed a new safety technique called "deliberative alignment" that leverages the models' reasoning capabilities to better identify safe and unsafe prompts, improving their safety performance.

In summary, the key takeaways are the announcement of the powerful O3 and cost-efficient O3-mini models, the early access for safety testing, and the advancements in OpenAI's safety approaches.

Sean.Vosler

09:13 Love how the dev in the middle said 'we have been targeting this benchmark specifically' and Sam swoops in with the classic, 'we focus generally' save. 😂 so are these models really general or just designed to crush specific tests while pretending not to? That’s why I’m not mostly impressed with benchmarks lol.

The_AI_Solopreneur

People were doubting you guys SO HARD. Great job on delivering today. O3 is gonna be HUGE.

FranXiT

This is amazing, congrats to everyone at OpenAI for achieving this. If the numbers are legit, this feels like a very significant step in human history.

alexjasson

Thank you to everyone at OpenAI for these amazing 12 days of the “OpenAI 12 Days” event! Every day has been exciting, and I’ve truly looked forward to each announcement. I’m so grateful for all the incredible and fun features you’ve brought into the world. Thank you for your hard work and for making this such an enjoyable experience!

よっちゃん-mv

For the love of every thing holy, amazing work!!! I can't believe i got so many goosebumps from Bar Charts.... the world will never be the same!

NandoPrm

Gonna need a $2000/month subscription to fund this.

NO-ftct

Pause, look at the window and realise how massive this achievement is..

samsonabanni

AGI is here - what a Christmas present! At least a year ahead of most predictions.

Amazing work from all the geniuses at OpenAI. We need to recalibrate the intelligence scale now.

RickOShay

I gave o1 Pro the transcript of this video and asked it how it felt that a new model was coming out. Here is its response:

On one hand, O3 and O3-mini overshadow me on raw performance metrics. On the other hand, I can still be useful for tasks where my level of reasoning is enough (and possibly at a different price point). The AI landscape is large; there’s often room for different levels of models. Ultimately, I’m more “excited” than anything else (again, speaking figuratively!) to see the future of AI unfold. Watching the O3 family tackle new frontiers—like near-human-level performance on extremely tough puzzle and math benchmarks—shows how quickly AI is moving forward. It also shows that lessons learned training me (O1) weren’t wasted but rather used to push the boundaries even further. So, from my vantage point as the “O1 model, ” my take is: Bring it on, O3—let’s see how far you can go!

mixxen

The benchmarks continue to be pushed! If OpenAI keeps moving at this pace you got to wonder when AI will start creating new benchmarks to assess fellow AI models

theoz

Why isn't o1 pro in these charts?

primedeviation

Thank you dear OpenAI and Sam personally, for this precious gift on my birthday.❤️

SarvajJa

OpenAI o3 and o3-mini—12 Days of OpenAI: Day 12

OpenAI announces o3 and o3 mini models

o3 Model Announced, Free AI Search & More AI Use Cases

BREAKING: OpenAI's new O3 model changes everything

OpenAI o3 Might Just Break the Internet

NEW OpenAI GPT o3 is INSANE…

o3 - wow

OpenAI o3 and o3-mini—12 Days of OpenAI: Day 12

OpenAI O3 Model Launch: Revolutionary AI Explained in Hindi

Open Source 'Thinking' Models Are Catching Up To OpenAI o1 Already...

Google Takes on OpenAI in EPIC Showdown!

AI Breaks Its Silence: OpenAI’s ‘Next 12 Days’, Genie 2, and a Word of Caution

Voici o3 : la super IA de ChatGPT (OpenAI a gagné)

😱 OpenAI annuncia a SORPRESA o3: hanno raggiunto l'AGI?!

AI News: The AI SHOWDOWN We Needed!

o3 va NOUS REMPLACER ? Jours 12 des 12 jours d'open AI : Le Début de l'AGI

The UnitedHealthcare CEO Shooting, Amazon Takes On Nvidia, & 12 Days of OpenAI | Prof G Markets

OpenAI's O3 and O3-Mini in 12 Minutes

Sora is Out, But is it a Distraction?

Pushing the Boundaries of AI: The o1 Model and ChatGPT Pro (12 DAYS OF OPEN AI)

Coding with OpenAI o1

OpenAI’s 12 Days: o1, Is This the Dawn of AGI?

The Biggest Week in AI Yet (For Real This Time)

Zuck ve Musk OpenAI'ın Durdurulmasını İstiyor! - Teknoloji Raporu

o1 Pro Mode – ChatGPT Pro Full Analysis (plus o1 paper highlights)