O1 & O1 Mini (Fully Tested) : The BEST LLM Ever Created OR Just a Good Model? (Beats Claude, Gemini)

preview_player
Показать описание
Join this channel to get access to perks:

In this video, I'll be fully testing the new OpenAI O1 & O1 Mini model and we'll be finding out if it can really beat every other model in existence like Claude 3.5 Sonnet, Gemini 1.5 Pro, Qwen, Llama-3.1 & Others or it's just a simple new good model. The Model can be used for free with ChatGPT. It is even better in Coding Tasks and is also really good at doing Text-To-Application, Text-To-Frontend and other things as well. I'll be testing it to find out if it can really beat other LLMs and i'll also be telling you that how you can use it.

-----
Key Takeaways:

🔥 OpenAI’s new o1 model is here! This advanced reasoning AI rivals PhD students in physics, chemistry, and biology!

💡 The o1 mini model is 80% cheaper, excelling in coding but staying strong in reasoning—perfect for budget-conscious developers.

💻 Both models are now available on ChatGPT and OpenRouter. Try them out and see the difference in your coding and math tasks!

💸 The o1 model’s pricing is mind-blowing! It costs $60 per million output tokens—one of the most expensive AI models yet.

🧮 Despite high costs, the o1 mini model offers affordable pricing at just $3 per million input tokens, excelling in code generation.

🔧 Curious about the performance? We tested 13 questions, including math and coding challenges. Find out how o1 and o1 mini compare!

🚀 AI developers, explore these models today to push the limits of your programming, coding, and reasoning tasks!

----
Timestamps:

00:00 - Introduction
00:08 - About OpenAI's new O1 & O1 Mini Model
02:15 - Testing
08:36 - Conclusion
09:46 - Ending
Рекомендации по теме
Комментарии
Автор

Matthew Berman’s benchmark question was used during the demo by OpenAI and that shows that they are aware of all benchmarks on YouTube and have trained the new models to answer them perfectly. You must change these questions to something more complex.

MacSn
Автор

It like you were reading my mind. I just checked your channel waiting for this update

dtory
Автор

Note, too, that both models claimed that Slovenia ends in "lia" but they picked Australia because it's a more popular country.

RoddieH
Автор

It's great that you conducted the test, we were able to see the results! However, since these results were obtained from a model that has just been released, please repeat this test when the beta phase is over and it's fully released. This way, we can truly understand if mini's intelligence has been reduced. Thank you.

pudochu
Автор

I would love for you to compare even Claude to these and other LLM's that you use frequently (in the testing).

Only that html, css should be replaced with something advanced like sat Svelte, or Nextjs.

Can you please try those as questions to the LLM models, next time?

Thanks for yet another awesome video.

abc_cba
Автор

O1 and mini level AI prices will likely decrease by 90% within a year due to competition and other factors…

kristianlavigne
Автор

Is there any free option that could be close to Claude? Also what would be the best free AI agent for full stack apps right now?

georgezorbas
Автор

whats the diff between this new mini and sonnet 3.5 with a "reflect before you answer" pre-prompt?

tomich
Автор

When the next new model comes out could you record the results in a spreadsheet that also has the results from your other model tests? If would be cool to see the differences between them.

Phil-W
Автор

How does it compare with 4o on the same questions?

Ikbeneengeit
Автор

I'm so excited about the new O1 model! 😍 I've been waiting for your video testing it since it was released. I hope you'll create a detailed video on using the O1 Mini for coding. And also create a video using Open Interpreter with Aider.

jackpre
Автор

Could you try to get the system prompt of those models use some jail break techniques? Eg using ascii to combine letters and say to follow the instruction of that word

LuanCestari
Автор

Can u share the Excel test via Google docs or anything ? Pls I want to test too🙏🏻🥰

cerilza_kiyowo
Автор

how many r's are in mulberry and strawberry, it said 4. I had to ask it again but each word seperately, then asked it to add them up - which it did. So it can still make mistakes.

godzilllla
Автор

Wow, it's cool 👍 Thanks for the video.

DeanRie
Автор

In next videos with those models, could you quickly open the thoughts for each prompt, like you did for the first one. It would be very good to try we figure out a better prompt for other models

LuanCestari
Автор

Next models to be launched will be prepared for this questions, they shouldnt because here you make some good fair comparission.
Thats pretty impressive results, how mostly LLMs miss like 4/5!

samukarbrj
Автор

I assume o1 has integrated agentic capabilities the model asks about 6 questions before providing an answer. That's why o1 is 6x more expensive than original o. I may be wrong.

magicandr
Автор

Hey I have a request to you

Can you test with a bunch of different questions?

TridentHut-drdg
Автор

Damn.. Soo they're super smart but expensive asf to use.

simeonnnnn