China DROPS AI BOMBSHELL: OpenAI Is WRONG!

preview_player
Показать описание

0:00 Research Introduction
1:17 OpenAI Claims
2:32 Distribution Testing
3:48 Training Data
5:11 Model Behavior
6:45 Color Priority
8:15 Shape Transformation
9:44 Research Implications
11:23 Data Retrieval
13:16 Marcus Response
15:06 Pattern Matching
16:25 LeCun Theory
17:36 Architecture Differences
19:39 Meta Demonstration
20:40 Prediction Limits
22:26 VJPA Comparison
24:17 Final Thoughts
Links From Todays Video:

Welcome to my channel where i bring you the latest breakthroughs in AI. From deep learning to robotics, i cover it all. My videos offer valuable insights and perspectives that will expand your knowledge and understanding of this rapidly evolving field. Be sure to subscribe and stay updated on my latest videos.

Was there anything i missed?

#LLM #Largelanguagemodel #chatgpt
#AI
#ArtificialIntelligence
#MachineLearning
#DeepLearning
#NeuralNetworks
#Robotics
#DataScience
Рекомендации по теме
Комментарии
Автор

The real achievement here was making a 2 minute explanation into a 24 minute video

internetwarrior
Автор

"OMG it's just advanced retrieval and not really smart!"
... Welcome to College.

drunktrump
Автор

OK, look. If an AI model only sees red circles and blue squares in its training data, you just taught it that red objects are always circles and blue objects are always squares, and it learned that, and it thinks that that is a physical law. It is not unable to learn physical laws. It is just learning exactly what it is taught, and if you teach it things that are wrong, it will believe them, and that is the fault of the teacher, not the student.

GeneralPublic
Автор

It's not retrieval. Training a tiny cartoon model and overfitting it on solid colours that only move left and right doesn't prove that it's incapable of learning the physical laws, it proves that you're incapable of creating the dataset and hyperparameters that would allow it to do so.
This is nothing new either, this epiphany is something every beginner learned 20 years ago.
Obviously we know that the neural network in a security camera has learned the general features of faces because otherwise when you have an intruder, it simply wouldn't be able to detect that novel face that wasn't in it's training data, and because we've looked inside those models to literally visualize those features it learned.

However, you can do what they did here and train a face detector to overfit on specific faces, and then it won't learn the general features of faces, it will only be able to detect when 1 particular face is present.
You can also overfit on certain face shapes. If all the faces in your dataset are cartoonishly round, then it's going to overfit on round faces and assume things about other face types that are only relevant to round faces. This doesn't prove that the properly trained model isn't learning how faces work and doesn't say anything about the architecture in general. You need a diverse dataset that represents real world physics for these complex abstractions to emerge.
It's like only ever teaching a child about addition and then when they fail a multiplication test you say "aha! see! they're incapable of learning math because they can't generalize".

steve_jabz
Автор

Nothing new here, this is what plenty of people have been saying about llms forever.

Also they overfited a small model, this basically says nothing and is exactly what we would expect.

countofst.germain
Автор

You need to stop the okay bring back the pretty pretty

kecksbelit
Автор

In the example where the red square turns into a circle at 09:45, what question did the researchers ask the AI beforehand? Did they ask anything specific, or did they just present it with a short video with no guided prompting? If no prompt was given, why can't the AIs response be interpreted as "A red square must be a mistake. I've seen how the world works and red squares should not exist. So first things first, I need to correct that and make it a circle." If that is the only thing it has "experienced" then that is how it 'believes' the world should be, so it fixed it. Not unlike Plato's Allegory of the cave. If the model was trained on more diverse data, would it make the same choice or would it do as the researchers expected and kept the square a square?

CorvidGlass
Автор

I think this is already understood widely. If you train a model on video, it will “understand” how pixels change in videos. Video is just a visual representation of the physical world. Would you train a baby to walk, talk and function in the world by sitting it down and getting it to watch videos? Multimodality is what we “train on” as humans, why expect anything less from an a.i. system?

rplumb
Автор

It’s funny when you think about it—many people argue that language is what has enabled us to become “intelligent.” Yet, some of these same people argue that language models themselves can’t be the key to creating intelligence. They might have valid points, and they might be saying things that are technically true, but I think they’re missing the larger point. We ourselves don’t fully understand what intelligence actually is, and we don’t all agree on a precise definition. And, really, who cares? The question isn’t whether we’re creating a true world using exact physics; it’s whether we’re creating a world that behaves in a certain way.

Some critics seem to be trying to judge these models by whether they take the same paths we do, like relying on physics engines in games. For example, if you have a VR game that’s incredibly realistic with a physics engine, it doesn’t “understand” anything. But it achieves the effect. So, I think these critics are missing the point—many of their arguments, even if technically correct, can be dismissed with a simple question: does it accomplish what we intended? It’s almost that simple.

Consider this: I don’t fully understand the physics of walking, yet I still walk. So, what exactly would their argument be there?

ClassicRiki
Автор

Sora understands physical law as it pertains to the input it has been given. You first have to define understanding properly. Why is this news. This his how our own brain works. Why do you think that Aristotle, who was definitely intelligent, believed that the speed of an object falling was proportional to its weight: heavier objects would fall faster than lighter ones. He had a limited set of training data, and for him this was the sensible conclusion. That doesn't mean he hasn't inferrred a general rule, just that he inferred the wrong rule.

merdaneth
Автор

As I read this paper it addresses a classic problem in statistical prediction: any machine learning or predictive model will always be biased towards predicting outcomes within its trained distribution. In simpler terms, models like LLMs are designed to predict the next letter, while models like Sora focus on generating the next frame in an image or video, based on patterns they’ve seen before.

The paper argues that these models struggle with ‘out-of-distribution’ predictions, meaning they’re not effective at identifying what doesn’t fit their learned patterns.

The problem is not just the inverse of predicting the next likely outcome. It’s a harder problem because the model would need to account for an infinite multi dimensional space of possibilities outside its training scope. If to be solved halfway you need something like a “Sora multiple scenario outcome model”, which is not built yet and requires either an extreme scaling of the existing architecture to account for all scenarios, then that scenario model(s) could be put into the “final Sora” along side the rest of the data, or a completely different architecture.

Must say I initially tend to agree with the paper’s perspective.

mariusj
Автор

What is AGI?

To understand it better, try asking yourself: at what age could a child perform this task? This is the mindset you should adopt when considering AGI’s potential. And if you think AI is a deception or a con, it may help to consider the principles of evolution.

Is anyone thinking about what we genuinely need from a superintelligent AI, beyond just what humans might prefer? Do we truly want an AI to be the ultimate arbiter of truth? And if so, who decides what truth even is? Would it be defined by the victors in humanity’s long and complex history? Do we, as a species, need an Oz-like figure pulling the strings behind the curtain before we “awaken” to reality?

quietackshon
Автор

This is missing a big part of the picture. In dreams you can fly, objects can jump around etc. Your human mental model doesn't have to conform to cause and effect or strictly to physics *unless* you are using it to interact with the real phenomenal world and then your body uses feedback mechanisms to correct the model to conform to what is actually happening in external physical reality. It doesn't matter if AI generated videos hallucinate impossible things. It's a feature not a bug in a world model. Your world model has to be close enough for government work not perfect.

kalliste
Автор

This seems pretty dumb... if all the training data you have are red circles and blue squares, it's not surprising that the model does shit when it sees a red square. It wouldn't happen as the training data becomes larger, and the model extracts general essential patterns from apparently unrelated situations. Scale is the key.

Nico-diqo
Автор

You are getting it wrong. The models have world shattering capabilities, even without being "really" intelligent. The failures regarding "doing it from memory", instead of "doing it by logic" are evident in mathematical problems. However, they change the whole infraestructure with both AlphaProof, with simbolic language. and won the silver medal in math olimpics

hanskrakaur
Автор

There's a massive difference between an LLM and a visual generation AI.

Within language, humans have already encoded and implicitly shown the relationships between the representative units. (words) There is almost zero such relational data in visual data units. (like moving squares)

Do not conflate LLM style AI with video generation AI. Max Tegmark recently put out a paper that highlights the relational geometric nature of the information in an LLM which represent implicit aspects of the world, but it will only know things that are not learned in the childhood phase that humans need not ever discuss at length because of shared human experience.

There are explicit geometric relationships between concepts in an LLM that are similar. "Queen" is close to woman and "King" is close to man but those two sets are far away from each other in the actual dimensional information encoded into neural networks trained on language. Those relationships are in language and not in physical objects viewed through a camera.

shivameucci
Автор

If a child was shown only red balls and blue squares and then one day he was shown a red square, chances are that his brain/sight would correct the shape or the color to fit his expectations. There are a lot of adjustments going on when we see something. Obviously the technology behind the human brain is way more sophisticated than that of our generative AI but the data poured into the human brain, through our five senses, is insanely more important than a few billion parameters.

RWilders
Автор

I hate when people use the word understand in reference to AI because these models do not understand in the same paradigm in which we do, when you ask the question does it understand and then constrict the model to our basis of contextual and experiential understanding the answer is always no at this point.

jamieclark
Автор

Unbelievable. This is SO basic if you've ever really worked with Neural Networks or even just simple regressive stats!
1. INTERPOLATION can be reasonably dependable if you cover the input data space well
2. EXTRAPOLATION is much harder. Rule of thumb: The further away from covered input, and the more complex your models, the worse things get.
How on earth would an AI model know that form is more important than color? WE know that, because we have context. The model can simply minimize its error by complying with the known input. That's why LeCun et al think they can 'correct' this by adding context. The wall they'll hit is ... what IS this context that you need?

BertGafas
Автор

Yann LeCun is saying this already for like 10+ years. They tried to train on video to make model of the world for many years. Didn't work and will certainly not work with synthetic video.

Eldobbeljoe