OpenAI's o1: Has It Surpassed Claude 3.5 Sonnet? Testing with Cursor

preview_player
Показать описание
Testing OpenAI's O1 Mini Model: First Impressions & Setup Guide

00:00 Introduction to the O1 Preview and O1 Mini Model
00:31 Setting Up the Models in Cursor
01:35 Evaluating the Model's Performance
02:31 Making Further Adjustments and Improvements
06:01 Final Thoughts and Recommendations
10:09 Conclusion and Call to Action
Рекомендации по теме
Комментарии
Автор

Interesting. o1 is probably not the model far small fixes. Imagine o1 doing the planning and telling claude which individual code snippets to produce. This is so wild!

sjkba
Автор

On 3:32 you wondered that it was inline a bit slower. That's correct, because you used the "Please Fix" prompt not with gpt o1, you used claude 3.5 in that moment.

Björn-wv
Автор

Thanks for the video! Going to be testing it today as well!

PatrickSteil
Автор

I wonder how it performs on more complex tasks. The strong suit should be logic.

Thanks for the video

Sergio-Sanchez-com
Автор

Claude 3.5 sonnet is a very well trained model

gabrielsandstedt
Автор

Really interest to see all this leapfrogging.

crispinrovere
Автор

🎯 Key points for quick navigation:

00:00:00 *🖥️ Overview of OpenAI's o1 Model Integration with Cursor*
- Overview of integrating OpenAI's o1 model with Cursor,
- Explanation of how to add models and API keys in Cursor settings.
00:01:23 *🛠️ Using the o1 Preview Model for Web Development*
- Comparison of o1 Preview and o1 Mini models for coding tasks,
- Limitations of o1 Preview model in streaming responses,
- Demonstration of generating web page content using Cursor's composer.
00:02:59 *🧩 Customizing UI Elements and Performance Considerations*
- Customization of generated web pages and UI components using Cursor,
- Discussion on the speed and responsiveness of Cursor's integration with OpenAI models,
- Considerations on cost and performance when using the o1 models for development tasks.

Made with HARPA AI

aistreet
Автор

There are probably over 5 billion website boilerplate examples for the model to learn from. I don’t know. When you actually have to build non-boilerplate, it gets complicated, fast. I do use Cursor on a daily basis.

GrahamAnderson-zx
Автор

We have figured out how to build AGI, but we are still limited by computation limits. Once we painstakingly get to AGI, the AGI will work to to make computing better and those will feed each other. 2030 is the best guess for when we wil get AGI. I think O series is a complete architecture for AGI unlike GPT series. The way openai have made tokens to work as reasoning seems interesting. It acts like an internal monologue of human brains. o1 continuosly doubts itself when it "thinks" inside its chain of thought. It proposes alternatives, goes into buts and also sticks to a solution if it sounds sufficiently promising. Overcoming context limits and making this inference time compute indefinite will get us to AGI .Maybe O5 will be AGI.

rickandelon
Автор

the problem with Next.js syntax was always interesting to me, in the sense that you can train the model on new data but what if it still has old data/docs inside the model. Does in that case the model know the timeline of information, so it is aware what data is newer? If that makes sense.

levato
Автор

It was in my cursor without any setup yesterday.

digidope
Автор

Does it surpass Claude 3.5 Sonnet in meeting your specific use cases?

cbgaming
Автор

Every time i try other gen ai and give them chance i am always disappointed. That's why i keep comming back to chatgpt. Of course they can answer some basic to small complex things but based on my usage they fail terribly. I tried to use Gemini so but but always disappointed. Claude is better than Gemini imo but it costs and i feel frustrated that it doesn't give enough tokens or rpm etc. the free one is too much limited to use.

samuelmarndi
Автор

Can I buy $1000 of credits to get access to tier 5 or do I have to spend $1000?

w.
Автор

Or you could just do the logical thing and give it an example of the type of web site you want it to emulate, but we both know there are more dedicated web site building AI platforms better suited to this purpose.

ToolmakerOneNewsletter
Автор

What does "stream back" even mean?

juhu
Автор

bro can you try one thing. A snake game in python with genetic algorithm. Its a good task to evaluate.

vivekbansal
Автор

meh, didn't see anything special - sonnet 3.5 is already doing all this, if not even better.

Jha
Автор

honestly o1 is garbage for what it's supposed to be. I haven't found a single use case where it's worth the time or extra cost to use it... With some extra effort in prompting, Claude performs just as well or better. I really don't get what everyone is so excited about.

avi