No AGI without Neurosymbolic AI by GaryMarcus

preview_player
Показать описание
Gary Marcus

No AGI (and no Trustworthy AI) without Neurosymbolic AI

Ontology Summit 2024
6 March 2024
Рекомендации по теме
Комментарии
Автор

Gary Marcus is absolutely terrible at arguing a valid point about the state of AI. He constantly shows instances of these models screwing up something at inference which all the scale-bros will just refute with their next beefed up language model. He needs to stop pointing at bad inference, it makes him an easy target to dunk on when these models inevitably get trained to fix them. He needs to focus publicly on the flaws of the LM architecture alongside his examples of bad inference. I can’t say whether his neuro-symbolic approach is the key to his “AGI” but his overall criticisms are mostly valid.

melkenhoning
Автор

"LLMs cannot achieve a certain level of abstraction, knowledge, agency, and reasoning ability due to inerrant limitations in the presentation of data, architecture, and methods of information propagation."

This is not a direct quote from the presentation, but I feel that this is a summary of Mr. Marcus' opinions as presented in this talk.

I'm...Not entirely sure this is "right", but it may be "correct".

A lot of the examples given in this presentation were akin to cheapshots, in my opinion, a good example being the Sora video in which liquid for some reason just vacates the glass through a solid barrier for lack of a better word.

The ability to have water behave as well as it did through entirely "latent" means was remarkable in and of itself, so I don't feel that being reductive about it having a hiccup and claiming "there is no way this system, even if scaled, will ever produce a reasonable understanding of the world" is totally fair. I feel that taking into account the world we live in, one would also have to argue "Due to quantum tunneling, it's inconceivable we could have solid objects" in order to maintain an internally consistent worldview. Obviously anyone reading this has the benefit of scrolling down to read this comment precisely because scale in emergent systems can produce reliable behavior (allowing you to lay your hand on a mouse that functions as though it were a solid barrier).

It's worth noting that even Sora probably isn't remotely near the number of neurons and synapses that a human brain has, and given the improvements we've seen in information density with sparse architectures, distillation, and quantization, I think it's not unreasonable to propose that a version 10x the size of Sora (in its quantized state), distilled from a variant of Sora 100x the size of that, could probably have a stronger latent understanding than we're seeing now, and much as quantum tunneling is evened out by an increase in scale stabilizing the probabilities of that tunneling, I think many inexplicable behaviors of Sora would be evened out.

But another major point: MLP blocks are universal function predictors. Yes, there are levels of abstractions these models may have yet to understand, and yes it would be nice to encode a bootstrapped understanding of those ideas inherently with the architecture, but given sufficient examples of that concept, and the neurons to model it, I as well posit that you will find those ideas to simply be an unknown function.

It's worth noting that at almost every stage of the development of neural networks as a field of study, people have thought "Oh, we have to encode a human level understanding of these ideas into the structure of the network" and it's only with modern scales of data and neural networks that we've achieved it, not necessarily with architectural improvements. If you go back and replicate research from studies ten, twenty, fourty, seventy years old, I think you'll find that the capabilities of those studies, when scaled to modern levels, are far more capable and "modern" than you would expect, simply due to scale.

But I think there's going to be a fair bit of skepticism to this comment, and so I'd like to address a few ideas that come to mind.

"We don't have the data to encode an understanding of these high level ideas, because it's not exactly something found naturally on the internet, and even when it is, it's difficult to annotate that example in a way that could tech that high level understanding to an LLM"

Sure. Looking purely at language, it can be difficult to convey the nature of, for instance, transparent objects, and notably transparent objects as they relate to theory of mind. So don't convey it with language. Multimodality will likely offer a huge advantage in general world knowledge, and is probably a step closer to AGI, and we have a variety of modern software tools (Blender and so on) which can convey various important ideas. We can also do simulations in game engines for things like theory of mind to an extent, which could keep track of objects a specific figure has seen, for instance. I also think that multi-turn training, where LLMs interact with one another (or instances of itself) for things like theory of mind will also be a big step, in the sense that we're seeing their performance when they haven't really "interacted" in their training phase, they've only "learned passively" if you put it in human terms.

"LLMs don't understand things! They only do statistical analysis"

All right, sure. So why is it that they can generalize to moves that they haven't seen in chess? Why can they produce a relative understanding of the chess board from one dimensional data? Why can you remove all instances of addition equaling a certain number and still find that the model can perform that addition? Why can they produce a higher dimensional understanding of lower dimensional data?

Because it's the most efficient way to model those predictions accurately. My suspicion is that this is related to "Grokking".

It's also worth remembering that it's probably possible to break these ideas down into tiers of difficulty to understand, and the tier nearest to the current capabilities can probably be modeled in answers to prompts from humans, so we can probably produce synthetic data (which by its nature is remarkably well labelled and numerous) to solve a variety of high level understanding problems.

"Look at this capability this model still doesn't have! It'll never achieve it!"

Models will also never be able to write stories, tell jokes, function as agents, paint pictures, produce video, or annotate whether a bird is in an image or not. These are all capabilities that were said to be impossible, and have become possible over time, to varying degrees.

Yes, it's possible that there are major things that they can't achieve currently, but in the face of the massive level of progress that we've seen, especially recently, I'm not terribly inclined to bet against neural networks achieving many things (though in the interest of being fair to a speaker who likely will never have the opportunity to defend himself against this comment, I'll allow that preventing a divorce resulting from an argument about the color of the blinds in a Home Depot may be one of them).

It might be that we hit a scaling wall, beyond which more parameters cannot accurately convey information to long range dependencies, so for instance, it's not possible to move information satisfactorily from neuron 1 to neuron 1 trillion, and no scaling beyond that will help (I could get behind this opinion actually, it would explain why mixture of experts is so effective at larger parameter counts), but even if that's the case, I'm fairly confident that there should be something that can be solved in the workflow.

Nobody said that these models have to produce the correct answer to a prompt in a single shot.

What if it thinks about it in steps? What if it prepares a prompt for itself? What if it externalizes reasoning and mathematics to code? Can it produce images to visually reason through a problem? Can it produce thousands of responses and evaluate each one until it finds the right approach? Can it break the problem down into steps and solve it one by one in a multi-turn manner (not a simple Chain of Thought prompt)?

There are problems that humans can't solve in a way that we expect LLMs to. What happens when they have access to the same tools and autonomy that we do? What happens when they have a "sticky note"?

I'm not sure that we need anything more than refinement.

To reiterate:

I think you'll find that most failures of AI models going forward will be because they just didn't have the quantity of data, quality of data, or parameters to model "the last function".

novantha
Автор

ok but claude 3 opus is actually insane at generalizing its knowledge to reason intelligently

dg-ovcf
Автор

My new favorite science channel. great content

veganphilosopher
Автор

This talk is better than valium for a Doomer. Temporarily.

rightcheer
Автор

Story time.
When I was a kid, I used to read lots of science books, whether I understood it or not. The books were for people who were more advanced in these fields. So sometimes my teachers and my friends you to get curious and they used to ask me about what I had learnt. I had vague understanding of the subject then, and I used to tell them this or that. And most of my answers were believable but not quite true. It was like a game of playing detectives. I knew some facts and I tried to fit a story on to it. But people had established the facts before me. Facts that held true. Perhaps I could have modified my stories had I had access to all of the facts. It may be would not have made an interesting story, but it would have been a true one. I think the problem with today's AI is this; only it has all the facts, but not their relation to one another. In case of image generation, it has to understand or be programmed to understand, that it cannot work with the facts. How true things are arranged in a sentence can change the anticipated output into a statement that is untrue but desired. The AI is capturing a bunch of objects in the prompt and fitting them into the relations it has been trained on. (Horse riding an astronaut in a bowl of fruits)
In the case of historically inaccurate astronauts, it is simply a case of bad instruction. It has been instructed to be diverse regardless of what you prompted it to do. Increasing the weights for the trigger word 'historically accurate' will not simply solve it. Because it will trigger the neurones which control the background for example and if the prompt was for historically accurate astronauts in a mediaeval setting, it would have produced hot garbage.

lucid_daydream_
Автор

This is exactly what I've been thinking - and trying to even dabble in some as someone with a lot of coding experience (right now mostly limited to playing around with building a proper parser to parse natural language using an _ambiguous_ formal grammar combined with a neural disambiguator). The sense I get with it is that neural components should be used to create "articulations" between components of an otherwise-engineered AI system where that things could be "expected to get fuzzy". For example, consider an object recognizer. Ideally it should not just be a "black box". You should be able to supply an explicit polygon mesh model or the like indicating what it's supposed to recognize "should" look like "prototypically". In reality, of course, an explicit instance of the object will not be _exactly_ the polygon mesh - e.g. think a "real teapot" vs. the archetype given by the "Utah teapot" model in classical computer graphics. The neural part would bridge that gap - by learning, not how to recognize each object anew, but instead how to "deform" or "warp" _any given_ object to fit it to a candidate picture. Heck, we might envision a multi-stage system, where that one neural stage deforms the mesh geometry, then it is passed to a conventional render engine to generate a comparison picture, then a second stage deforms/tweaks that comparison image at a pixel level for things like light, shadow, and obfuscation - which one sees even more obviously could be factored out still further, e.g. with a shader in the pipeline one can have the renderer just generate lighting in the comparison image, and there'd be another neural network which adds and positions light sources around the object. In any case, once you have such a system, recognizing new objects is then as simple as just adding more poly mesh files to the system, which is exactly how it should be (and thus would likely result in dramatically smaller data load), and the operation is mostly transparent - you can even tap the neural output to see just how it is deforming the object to try and recognize it. And the trick is to find the "sweet spot" of how much is "programmed" and "engineered" by conventional software engineering versus how much is "trained" by the neural joints between components to minimize data demands (and thus allowing, say, 100% sourcing of data from ethical sources) while maximizing performance.

shimrrashai-rcfq
Автор

can the transformer network behind LLM's not develop a neurosymbolic system within itself in order to predict the next token

mitchdg
Автор

This is basically discussing the notion of AIs running on neuromorphic architectures as opposed to LLMs running on classic computer architectures?

mnrvaprjct
Автор

LeCunn would agree with you that language models simply being scaled up is NOT the future of AI

justinlloyd
Автор

Thanks for sharing this discussion with us Dr. Marcus. As an urban farmer who has grown up with drones and plans to use them to operate my family's farm after university, this is a very pressing issue for me. I am actually more optimistic than you are with regards to the timeline for the "revelation" of AGI, but as an expert your perspective is invaluable. Stay visionary 🚀

houseofvenusMD
Автор

I think you could use the principals of Otter AI on a more general AI. Otter AI listens to your conversation with another person and is able to tell you anything about the conversation. It doesn't add anything or take away. This property of only telling you about what was said can be changed to telling anything relating to that subject and only what is known about that subject. This avoids potential hallucinations.

cil_Tlt
Автор

About the glass breaking and the basketball example, maybe when it creates an environment like that we can program it like the physics engine of of a game

lucid_daydream_
Автор

"There is no physics theory where the objects can spontaneously appear and disappear"
Well, I have some news for ya...

true_xander
Автор

Not enough people seem to understand that the currently popular batch of AI is essentially just leveraging very very complex averaging. Or maybe it's just that too many people think that's what intelligence is.

loren
Автор

Interesting ideas for sure. I think some new breakthrough definitely needs to occur.

stfu_ayden
Автор

This video has confirmed my own thoughts and analysis. I've been reading and studying linguistics and philosophy for too long... from a computational paradigm, this all just seems obvious?

OpenSourceAnarchist
Автор

These symbolic people never stop - how long will it be before they realize this is a deadend?

igvc
Автор

I agree with the sentiment, but i would also disagree with how far it can take us. I can easily see AGI version 1 being an LLM, while future versions with better overall balance and function being hybrid models.

As far as calling it such.. well i expect the first AGI won't be called AGI for various reasons.

That said your complaints are warranted. You can tell when it doesn't fully grasp something.

ChaoticNeutralMatt
Автор

I think we should redefine agi as something that learns how to handle chaos. Or try to do it. Because as humans we too have a problem with handling chaos. So something like us or better.

Goddamn! I said the same thing as this guy.

lucid_daydream_