There's Something Weird About ChatGPT o1 Use Cases...

preview_player
Показать описание
What are the best use cases for the "thinking" o1 model?

Join My Newsletter for Regular AI Updates 👇🏼

My Links 🔗

Media/Sponsorship Inquiries ✅
Рекомендации по теме
Комментарии
Автор

Drop the use case you've only been able to get working with o1 models here.

matthew_berman
Автор

Here's how I've been using the o1 models: I will be working back & forth with 4o, let's say on a programming project, and get to a point where the model just can't quite get past an error, and we're going in circles. At that moment, I say: "OK, I need you to summarize where we are, and write the perfect prompt for o1 so he can hopefully get us past this error." 4o then writes the prompt, and I start a new session with o1, and it pretty much always fixes it. Then I give that info to back my session with 4o, and we continue.

jwm
Автор

I worked in corporate settings for 30 years. I rarely saw much "strategic" thinking done by executive leadership. Most of the time, people are running around putting out fires and trying to implement things as fast as possible. There is no time for "strategic" thinking.

Steve-xhby
Автор

use cases:
1. write complex code
2. solve complex math problems
3. suggest better strategy for solving problems.

these are the use cases i use that i found it crushes anything else out there

hqcart
Автор

I can see AI therapy becoming a huge thing. There's always a shortage of therapists and what therapists do is quite formulaic where they ask leading questions to make the patient sort of solve their own problems by thinking about them from new perspectives. Sure, there are therapists who aren't just following a dialogue tree, but I'd say that a sufficiently advanced LLM could equal or surpass the average therapist in the near future.

MidWitPride
Автор

Here's a weird one - I was using voice mode for my son's bedtime story (he asks for it now). I asked for a detailed and dynamic story about him winning a soccer tournament. It made a roaring crowd sound effect when he "scored" I asked my son if he heard it too, and he did. Then it did it again and we both noticed. It did it least 3 times. I thought it was some new feature, so I asked ChatGPT. It insisted it hadn't done it, and wasn't capable of it!

mikeedwards
Автор

The reason why you didn’t see much difference is because he used well structured prompts. I think it really comes down to how well the prompt being used is written because most people do not write well structured prompts. O1 preview refactors their prompt before answering the question. This is where the most progress is seen other than that for most cases 4o is just as good.

stefano
Автор

I have found that o1 is simply much better at writing code. I have it refactoring and writing pytest tests, and does wonderfully.

Kivalt
Автор

The surprising thing is that our LLMs for most peoples' use cases, are actually good enough on reasoning. What is holding them back is efficient thinking on huge context and constant training (growing) on experience.

NilsEchterling
Автор

I have thoroughly tested o1 over 3 days for multiple hours every day (yep, it cost me a fortune). I concludes that it's absolutely not worth it, the price difference to other models is absolutely staggering (we are talking almost 20x price since you also get billed for the invisible thinking tokens). Sure, in some cases it produces really cool responses (but at what cost?) but most of the time its the same rough result or sometimes even much worse. I had it literally tell me it won't be able to answer something but it still wasted a lot of thinking beforehand, essentially charging me 30 cents! for what is basically just a "nope".

dubesor
Автор

Recently, I have been using 4o often to write VBA code and helper functions for MS Access, Excel, and Word. I found many edge case examples where 4o was simply unable to code certain apparently simple tasks. For example, writing a subroutine to replace an Excel table at a certain bookmark within a Word document. I spent hours with 4o and it kept spinning in circles attempting its failed approach over and over. It kept placing the updated table inside the prior table instead of replacing it. I assumed this use case was something that 4o had very little training data on (specifically training on nuances of Word bookmarks with tables).

When I tried o-1 preview, it started with the same failed approach, but when I described the failed behavior that I observed, o-1 took only two more shots to come up with working code. With 4o, you could see that it was only tweaking its failed approach with no success, but with o-1, it was clearly trying completely different approaches and revealed its reasoning along the way.

BCrawford-xjqp
Автор

I've been cut-and-pasting a program into o1-preview and asking it to refactor or improve the code, then I take the output of o1 and cut-and-paste its suggestions into GPT-4o-canvas to get it to implement the changes. It's reasonably effective, though canvas sometimes still removes parts of the code it should leave if there's more than 100-200 lines of code.

nathanbanks
Автор

The mindset I've been having when using o1 is if I feel like I'm stuck with 4o, I go to o1. The prompts I use when using o1 are very complex and long because I want it to get all the context of what I want it to work on. I consider that giving it simple prompts that 4o can do is somehow like wasting is time (and mine since it take time to think) and downplaying its capabilities.

spiker.c
Автор

we have tested 7 leading LLM's on our Reasoning Benchmark and o1 was amazing in it, it got 80% while Claude 3.5 Sonnet could only solve 2 questions and got 20%.. I would never have guessed it! honestly thought that Claude would be on the second place.. what we also have noticed is that code produced by o1 is often of high quality but on pair with Claude

TuringTears
Автор

I have found o1-preview is essential for tasks such as converting code from one programming language to another, where there are fundamental differences in the languages, such that a simple mapping of code is not the solution.

4o fails to take into account the complexity in such cases, and the code it produces fails, whereas the code from o1-preview works.

You can tell when it's a real challenge because o1-preview spends 60 seconds on the task.

trader
Автор

I assumed that 4o used tooling under the hood. If that’s the case, then we’re actually comparing 4o + tooling (such as a calculator), to o1 by itself. In which case… that’s still very impressive. It would pave the way for o1 to be cheaper than 4o as well.

matthewbartonisme
Автор

I barely use o1. I think the best example I had for it was when I needed to calculate the wattage of solar panels I would need for my van given that I want to charge x batteries, use x appliances, fridge, lights etc... and it came back to me with a really good answer compared to 4o. much more along the lines with my previous calculations and it took things like usable daylight into account

ChrisIsOutside
Автор

o1 needs access to tools during its internal reasoning phase. Looking up recent events online to confirm them or getting the most recent documentation for a Python library would be invaluable, as would writing and running code as part of the thought process. o1 (or o2) would be unstoppable if it could integrate with other OpenAI services.

notnotandrew
Автор

I think it is weird that many responses are bulleted lists

DannerBanks
Автор

This video is the reason why your chanel @Mathew is on my "Only let these notifications get through to me" and then the more added value is from the comments section.

tuaitituaiti