OpenAI's o1: Has It Surpassed Claude 3.5 Sonnet? Testing with Cursor

Показать описание

Testing OpenAI's O1 Mini Model: First Impressions & Setup Guide

00:00 Introduction to the O1 Preview and O1 Mini Model
00:31 Setting Up the Models in Cursor
01:35 Evaluating the Model's Performance
02:31 Making Further Adjustments and Improvements
06:01 Final Thoughts and Recommendations
10:09 Conclusion and Call to Action

Developers Digest

Рекомендации по теме

Комментарии

Interesting. o1 is probably not the model far small fixes. Imagine o1 doing the planning and telling claude which individual code snippets to produce. This is so wild!

sjkba

On 3:32 you wondered that it was inline a bit slower. That's correct, because you used the "Please Fix" prompt not with gpt o1, you used claude 3.5 in that moment.

Björn-wv

Thanks for the video! Going to be testing it today as well!

PatrickSteil

I wonder how it performs on more complex tasks. The strong suit should be logic.

Thanks for the video

Sergio-Sanchez-com

Claude 3.5 sonnet is a very well trained model

gabrielsandstedt

Really interest to see all this leapfrogging.

crispinrovere

🎯 Key points for quick navigation:

00:00:00 *🖥️ Overview of OpenAI's o1 Model Integration with Cursor*
- Overview of integrating OpenAI's o1 model with Cursor,
- Explanation of how to add models and API keys in Cursor settings.
00:01:23 *🛠️ Using the o1 Preview Model for Web Development*
- Comparison of o1 Preview and o1 Mini models for coding tasks,
- Limitations of o1 Preview model in streaming responses,
- Demonstration of generating web page content using Cursor's composer.
00:02:59 *🧩 Customizing UI Elements and Performance Considerations*
- Customization of generated web pages and UI components using Cursor,
- Discussion on the speed and responsiveness of Cursor's integration with OpenAI models,
- Considerations on cost and performance when using the o1 models for development tasks.

Made with HARPA AI

aistreet

There are probably over 5 billion website boilerplate examples for the model to learn from. I don’t know. When you actually have to build non-boilerplate, it gets complicated, fast. I do use Cursor on a daily basis.

GrahamAnderson-zx

We have figured out how to build AGI, but we are still limited by computation limits. Once we painstakingly get to AGI, the AGI will work to to make computing better and those will feed each other. 2030 is the best guess for when we wil get AGI. I think O series is a complete architecture for AGI unlike GPT series. The way openai have made tokens to work as reasoning seems interesting. It acts like an internal monologue of human brains. o1 continuosly doubts itself when it "thinks" inside its chain of thought. It proposes alternatives, goes into buts and also sticks to a solution if it sounds sufficiently promising. Overcoming context limits and making this inference time compute indefinite will get us to AGI .Maybe O5 will be AGI.

rickandelon

the problem with Next.js syntax was always interesting to me, in the sense that you can train the model on new data but what if it still has old data/docs inside the model. Does in that case the model know the timeline of information, so it is aware what data is newer? If that makes sense.

levato

It was in my cursor without any setup yesterday.

digidope

Does it surpass Claude 3.5 Sonnet in meeting your specific use cases?

cbgaming

Every time i try other gen ai and give them chance i am always disappointed. That's why i keep comming back to chatgpt. Of course they can answer some basic to small complex things but based on my usage they fail terribly. I tried to use Gemini so but but always disappointed. Claude is better than Gemini imo but it costs and i feel frustrated that it doesn't give enough tokens or rpm etc. the free one is too much limited to use.

samuelmarndi

Can I buy $1000 of credits to get access to tier 5 or do I have to spend $1000?

w.

Or you could just do the logical thing and give it an example of the type of web site you want it to emulate, but we both know there are more dedicated web site building AI platforms better suited to this purpose.

ToolmakerOneNewsletter

What does "stream back" even mean?

juhu

bro can you try one thing. A snake game in python with genetic algorithm. Its a good task to evaluate.

vivekbansal

meh, didn't see anything special - sonnet 3.5 is already doing all this, if not even better.

Jha

honestly o1 is garbage for what it's supposed to be. I haven't found a single use case where it's worth the time or extra cost to use it... With some extra effort in prompting, Claude performs just as well or better. I really don't get what everyone is so excited about.

avi

OpenAI's o1: Has It Surpassed Claude 3.5 Sonnet? Testing with Cursor

OpenAI’s new “deep-thinking” o1 model crushes coding benchmarks

OpenAI's New AI GPT-o1 STUNS The ENTIRE INDUSTRY Surprises Everyone! (STRAWBERRY RELEASED!)

OpenAI Just Shocked the World 'gpt-o1' The Most Intelligent AI Ever!

Explaining OpenAI's o1 Reasoning Models

Figure Status Update - OpenAI Speech-to-Speech Reasoning

Build Anything with OpenAI o1, Here’s How

'OpenAI o1: Revolutionary AI Model Outperforms PhD Scholars in Science Tests!'

Open AI SHIPS: 'GPT o1' First Look! ('Strawberry' Chain of Thought Reasoning)

ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview)

OpenAI Releases GPT Strawberry 🍓 Intelligence Explosion!

🍓 'As useful as a GOOD grad student' ― Surpasses Human Capability on Many Tasks (but not a...

AI passed the Turing Test -- And No One Noticed

o1 - What is Going On? Why o1 is a 3rd Paradigm of Model + 10 Things You Might Not Know

OpenAI's O1 (Strawberry) AI: Time to Think = Ability to Reason

Why OpenAI Now Looks a Little Bit Evil

AI says why it will kill us all. Experts agree.

OpenAI o1: Camino a las IAs con RAZONAMIENTO SOBREHUMANO | Análisis completo

Robots testing the Bulletproof #cybertruck

о1 (Strawberry) На Максимум – Как использовать новую нейросеть? Промпты и Юзкейсы...

Its Time To Talk About The Truth Behind OpenAI...

GPT o1-preview vs. Claude 3.5 Sonnet Ultimate Coding Test: Which Model Is Better Coder? Explained

AI can't cross this line and we don't know why.

Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters

That's DOCTOR ChatGPT to you!