O1 & O1 Mini (Fully Tested) : The BEST LLM Ever Created OR Just a Good Model? (Beats Claude, Gemini)

Показать описание

Join this channel to get access to perks:

In this video, I'll be fully testing the new OpenAI O1 & O1 Mini model and we'll be finding out if it can really beat every other model in existence like Claude 3.5 Sonnet, Gemini 1.5 Pro, Qwen, Llama-3.1 & Others or it's just a simple new good model. The Model can be used for free with ChatGPT. It is even better in Coding Tasks and is also really good at doing Text-To-Application, Text-To-Frontend and other things as well. I'll be testing it to find out if it can really beat other LLMs and i'll also be telling you that how you can use it.

-----
Key Takeaways:

🔥 OpenAI’s new o1 model is here! This advanced reasoning AI rivals PhD students in physics, chemistry, and biology!

💡 The o1 mini model is 80% cheaper, excelling in coding but staying strong in reasoning—perfect for budget-conscious developers.

💻 Both models are now available on ChatGPT and OpenRouter. Try them out and see the difference in your coding and math tasks!

💸 The o1 model’s pricing is mind-blowing! It costs $60 per million output tokens—one of the most expensive AI models yet.

🧮 Despite high costs, the o1 mini model offers affordable pricing at just $3 per million input tokens, excelling in code generation.

🔧 Curious about the performance? We tested 13 questions, including math and coding challenges. Find out how o1 and o1 mini compare!

🚀 AI developers, explore these models today to push the limits of your programming, coding, and reasoning tasks!

----
Timestamps:

00:00 - Introduction
00:08 - About OpenAI's new O1 & O1 Mini Model
02:15 - Testing
08:36 - Conclusion
09:46 - Ending

Рекомендации по теме

Комментарии

Matthew Berman’s benchmark question was used during the demo by OpenAI and that shows that they are aware of all benchmarks on YouTube and have trained the new models to answer them perfectly. You must change these questions to something more complex.

MacSn

It like you were reading my mind. I just checked your channel waiting for this update

dtory

Note, too, that both models claimed that Slovenia ends in "lia" but they picked Australia because it's a more popular country.

RoddieH

It's great that you conducted the test, we were able to see the results! However, since these results were obtained from a model that has just been released, please repeat this test when the beta phase is over and it's fully released. This way, we can truly understand if mini's intelligence has been reduced. Thank you.

pudochu

I would love for you to compare even Claude to these and other LLM's that you use frequently (in the testing).

Only that html, css should be replaced with something advanced like sat Svelte, or Nextjs.

Can you please try those as questions to the LLM models, next time?

Thanks for yet another awesome video.

abc_cba

O1 and mini level AI prices will likely decrease by 90% within a year due to competition and other factors…

kristianlavigne

Is there any free option that could be close to Claude? Also what would be the best free AI agent for full stack apps right now?

georgezorbas

whats the diff between this new mini and sonnet 3.5 with a "reflect before you answer" pre-prompt?

tomich

When the next new model comes out could you record the results in a spreadsheet that also has the results from your other model tests? If would be cool to see the differences between them.

Phil-W

How does it compare with 4o on the same questions?

Ikbeneengeit

I'm so excited about the new O1 model! 😍 I've been waiting for your video testing it since it was released. I hope you'll create a detailed video on using the O1 Mini for coding. And also create a video using Open Interpreter with Aider.

jackpre

Could you try to get the system prompt of those models use some jail break techniques? Eg using ascii to combine letters and say to follow the instruction of that word

LuanCestari

Can u share the Excel test via Google docs or anything ? Pls I want to test too🙏🏻🥰

cerilza_kiyowo

how many r's are in mulberry and strawberry, it said 4. I had to ask it again but each word seperately, then asked it to add them up - which it did. So it can still make mistakes.

godzilllla

Wow, it's cool 👍 Thanks for the video.

DeanRie

In next videos with those models, could you quickly open the thoughts for each prompt, like you did for the first one. It would be very good to try we figure out a better prompt for other models

LuanCestari

Next models to be launched will be prepared for this questions, they shouldnt because here you make some good fair comparission.
Thats pretty impressive results, how mostly LLMs miss like 4/5!

samukarbrj

I assume o1 has integrated agentic capabilities the model asks about 6 questions before providing an answer. That's why o1 is 6x more expensive than original o. I may be wrong.

magicandr

Hey I have a request to you

Can you test with a bunch of different questions?

TridentHut-drdg

Damn.. Soo they're super smart but expensive asf to use.

simeonnnnn

O1 & O1 Mini (Fully Tested) : The BEST LLM Ever Created OR Just a Good Model? (Beats Claude, Gemini)

O1 & O1 Mini (Fully Tested) : The BEST LLM Ever Created OR Just a Good Model? (Beats Claude, Gem...

NEW DeepSeek-V3 is WILD (FREE!): BEATS 3.5 Sonnet & O1?

Marco O1 - 7b - Open source O1 Alternative - Fully Tested (Coding, Logic, Math and RAG)

BREAKING: OpenAI's new O3 model changes everything

Terminators have arrived, in the millions.

Electric Blender Mini Portable Personal Size Juicer Cup USB Rechargeable

Boston Dynamics Unveils Fully Electric Atlas Humanoid Robot

The Minisforum MS-01 | A Workstation Smaller Than A Textbook

AI News: Sam Altman Reveals 2025 AI Roadmap

Hands down the best thing have ever purchased for your apartment.😍⁠| Vankyo V700W

CRAZY $1,600 Custom Build #Shorts

OpenAI Just Revealed They ACHIEVED AGI (OpenAI o3 Explained)

3 Pistols at 3 Different Prices

Mini Fully Automiatic Forklift Electric Pallet Stacker

[New] Tactical Glock Blowback Shell Ejecting Pistol - Toy Gun

This Mini E-Bike is FASTER than a 72v Surron

Let's transform a 0.1 Square Meter Space into a Fully Function House!

Is iRobot really superior??? #shorts #short #youtubeshorts

best trimmer for clean shave look.

Building a Low-Power, Fully Loaded Plex Server

My Top 5 Favorite Full Auto Pistols!

Real PSP vs Fake PSP #shorts

Let's transform a 0.1 Square Meter Space into a Fully Function House! (Short Version) #shorts

DJI Mini 4K Worth it in 2025? Best Budget Drone? TIPS & TRICKS!