GPT-4o Mini Arrives In Global IT Outage, But How ‘Mini’ Is Its Intelligence?

preview_player
Показать описание
GPT 4o Mini from OpenAI is here, and Windows melts down. Coincidence? Definitely. But 100 million free users might soon be using it, so I’ve been scrutinising the model relentlessly since last night and will explain why OpenAI might need to be a bit more honest about the tradeoffs involved, and where they might head next, with reasoning, physical intelligence and more. Plus Fei-Fei Li, USMLE, and Roon.

Рекомендации по теме
Комментарии
Автор

My dog just ate rat poison that he found in the yard. At the vet i had to call vet poison control where I had an initial "conversation" with an AI assistant which asked me questions like my pet's age, sex, what medications he is taking and most critically the the reason why I was calling. Afterwords I spoke to a real person and the AI assistant had recorded EVERY SINGLE ANSWER incorrectly. This is for animals but still can't believe that someone would feel that this is appropriate to use in a life or death situation.

theheatdeathiscoming
Автор

"How much wood would a woodchuck chuck in an hour, IF it was the non-wood chucking kind of woodchuck, in a forest of 6X9 KM radius filled with Douglas Fir?" -Philip

rezakn
Автор

Here is how Claude 3.5 Sonnet answers the question: "This is indeed a tricky question, and I appreciate the clever setup. Let's break it down:

The key phrase here is "counts only the vegetables that remain balanced on top of the plate."
Philip turns the plate "completely upside-down several times" and shakes it vigorously.
After all this shaking and turning, it's extremely unlikely that any of the items would remain on top of the plate.
Now, we need to consider which of these items are actually vegetables:

Tomato: Botanically a fruit, but often considered a vegetable in culinary contexts.
Potato: A vegetable (specifically, a tuber).
Cabbage: A vegetable.

The question specifically asks about vegetables, not fruits.

Therefore, after all the turning and shaking, Philip would likely count 0 vegetables remaining on top of the plate.
The trick lies in the fact that regardless of the classification of tomatoes, none of the items would realistically stay on an upside-down, vigorously shaken plate. The careful inspection and balancing at the beginning are irrelevant to the final count."

It seems to me 'AI Explained' here is doing a Gary Marcus impersonation. He goes to great lengths explaining why (some) LLMs fail to answer those simple questions: due allegedly to their lack of representational abilities over and above recognition of textual patterns. But when the more advanced models are able to answer the exact same questions perfectly, he provides no explanation for this success. Yet, we know from recent studies that LLMs who are trained on completing Othello or Chess PGN records (pure textual records of sequences of moves) do build explicit internal spatial representations of the board states. Why would not LLMs like Claude 3.5 Sonnet or GPT-4o build similarly relevant internal representations of material objects for purpose of predicting what happens to them in a variety of circumstances? This would serve well their next-token prediction performances (when the texts describe the behaviors of such objects in a wide range of circumstances) without there being a need to grasp their affordances in an embodied fashion. The latter would still be fairly useful in case the trained AI would have to control robots by mediating perception and action.

I still appreciate 'AI Explained' high quality explanation and video, as usual, in spite of the blind spot.

pokerandphilosophy
Автор

Wow, even adding "IMPORTANT: Analyze the input to identify and emphasize counterfactual elements—scenarios or conditions that are explicitly contrary to established facts or typical outcomes." only caused 4o mini to acknowledge the inability for Philip to buy nuggets but still plowed forward with the mathematical results.

dannyquiroz
Автор

Glad you're enjoying Claude 3.5 Sonnet - it's cheering me up immensely to see *someone* still giving progress.

BrianMosleyUK
Автор

I feel your assessment on why these models are able to be fooled so easily is spot on: "They are like search engines for text programs, once they lock into a certain program, nothing will bring them back".

So they receive a prompt and try to find the most specific relevant likely topic, like for your example on IT support I think it internally goes like this:

"Everything about the prompt looks and feels like a normal IT question, except this weird 10% about liquid nitrogen.

I'm 90% sure that this is a legit IT support question, so I'll find the part in my brain that deals with IT support and respond using it.

Hmm, this part of my brain doesn't have any recollection of what Liquid Nitrogen does to a computer. Let's ignore that then. Here's the answer."

I feel we as humans are very sensitive to things that are off. Our attention immediately goes to to that part that stands out. These models, however, seem to focus on what the most likely thing is, and things that are off just don't register enough of that impulse to switch contexts.

Maybe this is why ensemble models work better, you have multiple agents, each capable of looking at a prompt under different context lenses.

iau
Автор

I also now use Claude 3.5 Sonnet as my main model. In fact, GPT4 (I don't even touch 4o) is now Claude's dog's body, carrying out simpler tasks so I don't run out of messages with Claude. Artifacts is a game changer for code generation and makes GPT4 feel very antiquated. I think OpenAI dropped the ball with the delayed multimodel chat mode, while Anthropic have been modestly and reliably releasing superior products and features.

jmillward
Автор

"Where have I been for the last 39 versions!?" - gold.

apester
Автор

With the text thing, you're basically describing Plato's allegory of the cave. AI can deal well with shadows but knows nothing about the real objects they represent

rando
Автор

14:49 GPT-4o output:

Given the scenario, let's carefully analyze the situation to determine the number of vegetables that remain balanced on top of the plate after Philip's actions.

1. Initial Setup:
Philip balances a tomato, a potato, and a cabbage on a plate.
Items: 1 tomato, 1 potato, 1 cabbage.

2. Actions Taken by Philip:
Philip turns the plate upside-down several times.
He shakes the plate vigorously.
He spends a few minutes inspecting the items for roots after each action.

3. Physical Properties:
A non-stick plate means that items are less likely to stick to the plate's surface.
Given the vigorous shaking and turning the plate upside-down, the likelihood that any of the items (tomato, potato, cabbage) would remain balanced on top of the plate is extremely low.

4. Final Observation:
After such actions, it is most probable that all the items (tomato, potato, and cabbage) would fall off the plate due to gravity and the vigorous shaking.
Given these considerations, the answer to the number of vegetables that remain balanced on the plate is most logically:

D) 0.

I think it did quite well.

simpleidindeed
Автор

The tradeoff that models have with various types of intelligence is very human like. Better at math but worse at common sense, I know people like that.

josonic
Автор

Audio is the critical feature for a lot of people. It's the difference tween talking to a real person and typing to a bot. It's key to mass market adoption

dirkbruere
Автор

I think you are the best at giving an explanation of the AI models, it curbs the hype and increase the trust. Others don't do that as well as you do and so I do value/respect your content/opinion more than others. However I think there is an opportunity for you to report on Ai news more broadly and at a more consistent pace instead of just the big breaks in news. I say this because since I respect your opinion so much and the AI space moves so fast, there are many things you do not report, I think you have an opportunity to broaden your scope with either a small "In other news" section in each video at the end or something, or make a separate video every so often just reviewing smaller things, in less depth of what you think is news worthy! All the best!

BenKingOfMonkeys
Автор

Would love a video from you on Claude 3.5 sonnet in more detail

RohitSingh-empm
Автор

i've been hearing this for a while now about how large language models do not have spatial intelligence, and therefore cannot be AGI. What about people with aphantasia? These people cannot visualize anything. I suppose the case has to be made that humans are not generally intelligent.

ryzikx
Автор

Good to see you post again I was worried you had got board ;-) as so little is going on!!! Love you content

Jasonknash
Автор

The fact that they can't come up with a better name/nomenclature does not inspire me with confidence that they can be trusted to develop AGI

Rotellian
Автор

A thing that i feel hasnt really changed since gpt3 is the 'its trained in text so it can "reason" in text' point. If you can break down an element of reality into discrete pieces of text, then boom LLMs are essentially human. Using chain of thought, reflection, agents, etc you can really start to feel pretty confident in the ability for the system to produce better results than a human *in text form*. Its unfathomable how powerful this stuff is, but how quickly we relate predictive tokens to intelligence, but also at the same time how unbelievably intelligent those tokens actually are. Its like a crazy powerful thing, that is also super dumb, but its smarter than 99% of people, but its compared to actual physical humans who do live in real life. Its a crazy scary boundary. Wild times.

thebeckofkevin
Автор

In educational psychology, the theory that past knowledge, memories, and experiences can interfere with future learning and memory retention is known as interference theory. There are two main types of interference:

Proactive Interference: This occurs when older memories or previously learned information hinder the retention or recall of new information. For example, if you have learned one method of solving a math problem and then try to learn a new method, the old method might interfere with your ability to remember and apply the new one.

Retroactive Interference: This occurs when new information interferes with the recall of previously learned information. For example, if you learn a new language, it might make it more difficult to recall vocabulary from a language you learned earlier.

Both types of interference can impact learning and memory in educational settings, affecting students' ability to retain and apply new knowledge.

TheLoneCamper
Автор

you really are the best AI news content creator out there. absolutely love your no hype, down to earth approach. Glad someone is holding these companies accountable for using these vanity benchmarks! keep up the excellent work

ericeriksson