ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview)

preview_player
Показать описание
o1. Strawberry. Q-Star. We finally get answers to what the next generation of LLM reasoning abilities will bring us. And it’s good. Better than I thought. Simple-bench partial results, plus an analysis of the full OpenAI system card, press release, new capabilities, benchmarks and far, far more. It is a step-change, and no, that’s not just hype. Welcome to the o1 era.

Chapters:
00:00-26:55: o1 ChatGPT - OpenAI

Рекомендации по теме
Комментарии
Автор

instead of watching OpenAI's own videos, I actually wanted this channel to explain me about the new model. 😂😄😄

Maksi
Автор

Please don’t burn yourself out getting your video out immediately. I think, many of us who watch your videos are always looking forward to your in-depth analysis but we also understand that it might take you a little bit longer to put your information out due to the amount of thought and work you put in.

mattbelcher
Автор

This review is MILES better than the others I have watched.

mcfarlangeoffrey
Автор

26:30 such a good phrase "stochastic parrots can fly so high"

pauljones
Автор

A new model released? Philip covers every detail within 24h. Just impressive work!

medicalaiexplained
Автор

Wow. Quick. And the only non-hype opinion that matters. Time to settle down and listen to reason . . .

HAL.
Автор

This is not a paradigm shift for my test cases.

1. It failed my custom ball physics test (not the one everyone else uses, since that is making it's way into the fine tuning/training as seen on the Two Minute Papers video), because it still doesn't understand physics intuitively like a human. It assumes a ball would not fall out of an upside down ceramic coffee cup that is held above a table. The model says "2. Cup is held upside down above the table: Assuming the ball doesn't fall out, it remains inside the inverted cup."
2. It was unable to solve a react.js view port lazyload bug. The issue is that more and more images keep loading as you scroll, so the view port jumps around, because it loads more images before the previous have loaded. It doesn't understand this without me doing most of the work and reasoning for it. It kept trying to adjust image sizes and css that was not the problem.
3. It failed to find the optimal solution to setting a square post at a 45 degree angle to a wall. I said I had to only a tape measure, but it kept wanting me mark out spots and measure to the center of the hole. My solution involved simply turning the post and measuring until both corners are the same distance from the wall. It wanted me to triangulate the hole on the other side of the wall and assumed the post would be in the center of the hole (it's at a 5 degree lean, and is not). It did realize that my solution was better and more direct.
4. I asked it "What is the prime factorization of 1, 090, 101, providing the prime factors and their respective exponents?" and it got it wrong. It does not have access to python or a calculator, once it does it will likely be a big improvement. ChatGPT-4o gets this one correct, by running Python code. I've seen ChatGPT-4 do some impressive stuff by telling it to solve it with code using brute force, even better than ChatGPT o1 is now.

As for what it's good at. It can understand and output much more code. It does better at refactoring and dealing with code that involves multiple files. It's defiantly an improvement, but about the same as Claude 3.5 with it's big context length vs ChatGPT-4, when it comes to coding ability.

If you want a real paradigm shift, have it analyze it's own text for assumptions, and follow those up with a question before continuing. This one change made it solve the ball and cup problem 100% of the time. Just ask "Did you make any assumptions? If so, correct them, or follow up with a question if more information is needed."

Response when using this method:

"Assumption Made: I initially assumed the ball stayed inside the cup.
Reality: Unless the cup has a lid or the ball is held in place, gravity would cause the ball to fall out when the cup is inverted."

Also

"Follow-Up:

Was there anything preventing the ball from falling out when the cup was inverted (e.g., a lid, the ball being stuck, or someone holding it in place)?
Is there additional information about the ball's behavior during the inversion?"

All very important questions about my scenario, that I was not 100% clear about, although I was specific about a ceramic coffee cup because they have no lid and can't be squeezed to hold the ball in.

It solves many of the weird responses, because it has a chance to notice them and correct them. It's trained on noticing poor responses just as it's trained on giving the correct response the first time, this means you can increase it's ability by tapping into this knowledge. This will lead to much improved answers, and actual interaction to break down and solve a problem, rather than it taking a best guess each time.

Most people don't use it to solve the types of problems they are training it on. They need to make it more deductive rather than more predictive. This way it's better at doing research and finding the correct answer in the noise, rather than knowing the answer outright. It needs a BS detector and to be good at using tools. This is why the LLM will never be better than a calculator, so why are they making it do math from memory? You're welcome OpenAI.

BlakeEM
Автор

I want to sincerely thank you for the effort you put into these videos

sagetmaster
Автор

Your channel was the first that I had ever subscribed to which was just over 18 months ago. Then because of the quality, such as today's video I have never missed an episode. Thanks for explaining things so clearly and without the hype.

alexclark
Автор

I'm glad they are trying ideas other than simply scaling. These unique ideas on top of scaling will be needed to reach human level reasoning and analysis.

jackfarris
Автор

I saw another video pop up which said that they released o1, I didnt watch it but waited to watch you video on it, you are the best and also most reliable and super fast in releasing these videos

wagnsprinter
Автор

OH MY GOD THIS GUY IS THE FINAL BOSS OF AI ANALYSIS HOLY SHIT. The model is out for 1 day.
Meanwhile AI Explained: 0:29

mAny_oThERSs
Автор

I have to admit, I never thought LLMs could be pushed this far

AgentStarke
Автор

It seems every model so far because of how they are trained simply cannot conclude that it isn't sure or doesn't know something. Not remarking that it can't pull real URLs and making up some is a great example. That inability shouldn't be punished, the realization that one can't do or doesn't know is crucial to figuring out the facts. I've even tried on a uncensored local model to break this habit with constant reminders that it's okay to disagree or say it doesn't know, but it always leans to the "give the human what they want to hear" mode.

rhaedas
Автор

Your videos are the best in the AI domain. Just when I'm doing my own research, reading the papers, and formulating my impressions, you release a video that often mirrors my own impressions.

Thanks as always!

leegaul
Автор

I have been waiting for this notification. He only uploads if something important has happened.

Landgraf
Автор

Thanks for your thoughtful analysis as always Philip 👏🏿♥️. Looks like OpenAI is back to shipping something cool. Only a matter of time before Anthropic launches something big

solaawodiya
Автор

Love your videos! You're hands down by far the best AI channel—just like the leap from GPT-3.5 to o1-ioi. Straight to the point, without the noise. Keep up the great work!

Kleddamag
Автор

I'm quite impressed with the o1 preview, but I'm a little bewildered on using it properly.

keeganpenney
Автор

In the movie "Her", the AI that powered Samantha was called OS1, and knowing how Altman likes that movie so much...

ScientiaFilms