OpenAI o3 and o3-mini—12 Days of OpenAI: Day 12

preview_player
Показать описание
Sam Altman, Mark Chen, Hongyu Ren, and special guest Greg Kamradt, President of ARC Prize Foundation, introduce and discuss OpenAI o3, o3-mini, along with a call for safety testing and a new alignment strategy for o-series models.
Рекомендации по теме
Комментарии
Автор

I appreciate the way you highlight the people who are working on these breakthroughs.

justinlloyd
Автор

Hahaha love how Sam was like "maybe not" when the guys joked about having the model self-improve itself in real-time as a demo. Skynet as a demo would be amazing from Open AI LOL

Swiplys
Автор

I witnessed this today, and it feels deeply personal. In the early 90s, I was inspired by the ANN revolution of the 80s (already by Hinton and his friends)—a field shaped by pioneers like Frank Rosenblatt (1950s/60s), a psychologist who laid the groundwork for neural networks. This intersection of Computers and Humans has always been my passion, leading me to pursue degrees in Computer Science. Few realize that some of the greatest minds in AI—like Hinton, whose early studies in Psychology didn’t stop him from earning the 2024 Nobel Prize in Physics—or even Internet pioneers like J.C.R. Licklider and Bob Taylor, had psychology backgrounds. Watching AI evolve to fulfill visions decades in the making, I feel profoundly grateful to witness this pivotal moment in history.

luisluiscunha
Автор

Congratulations, OpenAI! If O3 can learn on the fly from new data and solve ARC-AGI, we are very close to an AGI system. Furthermore, if it includes long-context memory, multimodality, spatial reasoning, and agents, it would be a technically robust AGI system. Wishing you the best for 2025!

zaharizarev
Автор

Folks, it's been a total Joy watching you nerd out on the great achievements you've had over the past 12 days. You've done some great stuff here. Great job

andrewflewelling
Автор

They actually scored 87.5% on ARC-AGI I mean if the results are true they actually have achieved human reasoning .

stevo
Автор

Apart from trying it to the very limits, and letting it score high in absurdly difficult problems, I miss a benchmark on reliability on easy tasks.
If we get near 100% on relatively simple tasks, that is were automation could start making a serious economic difference.
I would be more amazed of seeing a 99.9999% for middle school problems rather than 87% on PHD level problems. That would mean we can rely on these models for daily tasks with no supervision. I am looking forward to see those benchmarks. Who knows? We maybe there already.

danielmartinmonge
Автор

Here are the key takeaways from the video:

1. OpenAI is announcing two new models - O3 and O3-mini. O3 is a very capable model, outperforming OpenAI's previous O1 model on various benchmarks like coding, math, and the Arc AGI benchmark.

2. O3-mini is a more cost-efficient and performance-focused version of O3, providing similar capabilities at a fraction of the cost and latency compared to O1-mini.

3. OpenAI is opening up early access to O3 and O3-mini for safety and security researchers to help test the models before their public launch, which is planned for around the end of January for O3-mini and shortly after for O3.

4. The video showcases O3-mini's abilities through live demos, including generating code to solve programming problems, evaluating itself on the challenging GPQA Diamond math benchmark, and supporting features like function calling and structured outputs.

5. OpenAI has developed a new safety technique called "deliberative alignment" that leverages the models' reasoning capabilities to better identify safe and unsafe prompts, improving their safety performance.

In summary, the key takeaways are the announcement of the powerful O3 and cost-efficient O3-mini models, the early access for safety testing, and the advancements in OpenAI's safety approaches.

Sean.Vosler
Автор

09:13 Love how the dev in the middle said 'we have been targeting this benchmark specifically' and Sam swoops in with the classic, 'we focus generally' save. 😂 so are these models really general or just designed to crush specific tests while pretending not to? That’s why I’m not mostly impressed with benchmarks lol.

The_AI_Solopreneur
Автор

People were doubting you guys SO HARD. Great job on delivering today. O3 is gonna be HUGE.

FranXiT
Автор

This is amazing, congrats to everyone at OpenAI for achieving this. If the numbers are legit, this feels like a very significant step in human history.

alexjasson
Автор

Thank you to everyone at OpenAI for these amazing 12 days of the “OpenAI 12 Days” event! Every day has been exciting, and I’ve truly looked forward to each announcement. I’m so grateful for all the incredible and fun features you’ve brought into the world. Thank you for your hard work and for making this such an enjoyable experience!

よっちゃん-mv
Автор

For the love of every thing holy, amazing work!!! I can't believe i got so many goosebumps from Bar Charts.... the world will never be the same!

NandoPrm
Автор

Gonna need a $2000/month subscription to fund this.

NO-ftct
Автор

Pause, look at the window and realise how massive this achievement is..

samsonabanni
Автор

AGI is here - what a Christmas present! At least a year ahead of most predictions.

Amazing work from all the geniuses at OpenAI. We need to recalibrate the intelligence scale now.

RickOShay
Автор

I gave o1 Pro the transcript of this video and asked it how it felt that a new model was coming out. Here is its response:

On one hand, O3 and O3-mini overshadow me on raw performance metrics. On the other hand, I can still be useful for tasks where my level of reasoning is enough (and possibly at a different price point). The AI landscape is large; there’s often room for different levels of models. Ultimately, I’m more “excited” than anything else (again, speaking figuratively!) to see the future of AI unfold. Watching the O3 family tackle new frontiers—like near-human-level performance on extremely tough puzzle and math benchmarks—shows how quickly AI is moving forward. It also shows that lessons learned training me (O1) weren’t wasted but rather used to push the boundaries even further. So, from my vantage point as the “O1 model, ” my take is: Bring it on, O3—let’s see how far you can go!

mixxen
Автор

The benchmarks continue to be pushed! If OpenAI keeps moving at this pace you got to wonder when AI will start creating new benchmarks to assess fellow AI models

theoz
Автор

Why isn't o1 pro in these charts?

primedeviation
Автор

Thank you dear OpenAI and Sam personally, for this precious gift on my birthday.❤️

SarvajJa