It's Not About Scale, It's About Abstraction

Показать описание

François Chollet discusses the limitations of Large Language Models (LLMs) and proposes a new approach to advancing artificial intelligence. He argues that current AI systems excel at pattern recognition but struggle with logical reasoning and true generalization.

This was Chollet's keynote talk at AGI-24, filmed in high-quality. We will be releasing a full interview with him shortly. A teaser clip from that is played in the intro!

Chollet introduces the Abstraction and Reasoning Corpus (ARC) as a benchmark for measuring AI progress towards human-like intelligence. He explains the concept of abstraction in AI systems and proposes combining deep learning with program synthesis to overcome current limitations. Chollet suggests that breakthroughs in AI might come from outside major tech labs and encourages researchers to explore new ideas in the pursuit of artificial general intelligence.

MLST is sponsored by Tufa Labs:
Are you interested in working on ARC and cutting-edge AI research with the MindsAI team (current ARC winners)?
Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.
Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2.

TOC
1. LLM Limitations and Intelligence Concepts
[00:00:00] 1.1 LLM Limitations and Composition
[00:12:05] 1.2 Intelligence as Process vs. Skill
[00:17:15] 1.3 Generalization as Key to AI Progress

2. ARC-AGI Benchmark and LLM Performance
[00:19:59] 2.1 Introduction to ARC-AGI Benchmark
[00:20:05] 2.2 Introduction to ARC-AGI and the ARC Prize
[00:23:35] 2.3 Performance of LLMs and Humans on ARC-AGI

3. Abstraction in AI Systems
[00:26:10] 3.1 The Kaleidoscope Hypothesis and Abstraction Spectrum
[00:30:05] 3.2 LLM Capabilities and Limitations in Abstraction
[00:32:10] 3.3 Value-Centric vs Program-Centric Abstraction
[00:33:25] 3.4 Types of Abstraction in AI Systems

4. Advancing AI: Combining Deep Learning and Program Synthesis
[00:34:05] 4.1 Limitations of Transformers and Need for Program Synthesis
[00:36:45] 4.2 Combining Deep Learning and Program Synthesis
[00:39:59] 4.3 Applying Combined Approaches to ARC Tasks
[00:44:20] 4.4 State-of-the-Art Solutions for ARC

[0:01:15] Abstraction and Reasoning Corpus (ARC): AI benchmark (François Chollet)

[0:05:30] Monty Hall problem: Probability puzzle (Steve Selvin)

[0:06:20] LLM training dynamics analysis (Tirumala et al.)

[0:10:20] Transformer limitations on compositionality (Dziri et al.)

[0:10:25] Reversal Curse in LLMs (Berglund et al.)

[0:19:25] Measure of intelligence using algorithmic information theory (François Chollet)

[0:20:10] ARC-AGI: GitHub repository (François Chollet)

[0:22:15] ARC Prize: $1,000,000+ competition (François Chollet)

[0:33:30] System 1 and System 2 thinking (Daniel Kahneman)

[0:34:00] Core knowledge in infants (Elizabeth Spelke)

[0:34:30] Embedding interpretive spaces in ML (Tennenholtz et al.)

[0:44:20] Hypothesis Search with LLMs for ARC (Wang et al.)

[0:44:50] Ryan Greenblatt's high score on ARC public leaderboard

Machine Learning Street Talk

Рекомендации по теме

Комментарии

MLST is sponsored by Tufa Labs:
Are you interested in working on ARC and cutting-edge AI research with the MindsAI team (current ARC winners)?
Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.
Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2.

MachineLearningStreetTalk

This guy maybe the most novel person in the field. So many others are about scale, both AI scale and business scale. This guy is philosophy and practice. Love it!

therobotocracy

François Chollet is a zen monk in his field. He has an Alan Watts-like perception of understanding the nature of intelligence, combined with deep knowledge of artificial intelligence. I bet he will be at the forefront of solving AGI.
I love his approach.

abhishekgehlot

Amongst 100s of videos I have watched, this one is the best. Chollet very clearly (in abstract terms!) articulates where the limitations with LLMs are and proposes a good approach to supplement their pattern matching with reasoning. I am interested in using AI to develop human intelligence and would love to learn more from such videos and people about their ideas.

PrasadRam-xr

Finally someone who explains and brings into words my intuition after working with AI for a couple of months.

boudewyn

“Mining the mind to extract repetitive bits for usable abstractions” awesome. Kaleidoscope analogy is great

SmirkInvestigator

13:42 “Skill is not intelligence. And displaying skill at any number of tasks does not show intelligence. It’s always possible to be skillful at any given task without requiring any intelligence.”

With LLMs we’re confusing the output of the process with the process that created it.

YannStoneman

6:31 even as of just a few days ago … “extreme sensitivity of [state of the art LLMs] to phrasing. If you change the names, or places, or variable names, or numbers…it can break LLM performance.” And if that’s the case, “to what extent to LLMs actually understand? … it looks a lot more like superficial pattern matching.”

YannStoneman

One thing I really like about Chollet's thoughts on this subject is using DL for both perception and guiding program search in a manner that reduces the likelihood of entering the 'garden of forking paths' problem. This problem BTW is extraordinarily easy to stumble into, hard to get out of, but remediable. With respect to the idea of combining solid reasoning competency within one or more reasoning subtypes in addition perhaps with other relevant facets of reasoning (i.e. learned through experience, particularly under uncertainty) to guide the search during inference, I believe this is a reasonable take on developing a more generalized set of abilities for a given AI agent.

pmiddlet

Great presentation. Huge thank you to MLST for capturing this.

simonstrandgaard

Exactly what I needed - a grounded take on ai

TechWeekly

The process of training an LLM *is* program search. Training is the process of using gradient descent to search for programs that produce the desired output. The benefit of neural networks over traditional program search is that it allows fuzzy matching, where small differences won't break the output entirely and instead only slightly deviate from the desired output so you can use gradient descent more effectively to find the right program.

descai

I like Chollet (despite being team PyTorch, sorry) but I think the timing of the talk is rather unfortunate. I know people are still rightfully doubtful about o1, but it's still quite a gap in terms of its ability to solve problems similar to those that are discussed at the beginning of the video compared to previous models. It also does better at Chollet's own benchmark ARC-AGI*, and my personal experience with it also sets it apart from classic GPT-4o. For instance, I gave the following prompt to o1-preview:

"Wt vs vor obmhvwbu qcbtwrsbhwoz hc gom, vs kfchs wh wb qwdvsf, hvoh wg, pm gc qvobuwbu hvs cfrsf ct hvs zshhsfg ct hvs ozdvopsh, hvoh bch o kcfr qcizr ps aors cih."

The model thought for a couple of minutes before producing the correct answer (it is Ceasar's cipher with shift 14, but I didn't give any context to the model). 4o just thinks I've written a lot of nonsense. Interestingly, Claude 3.5 knows the answer right away, which makes me think it is more familiar with this kind of problem, in Chollet's own terminology.

I'm not going to paste the output of o1's "reasoning" here, but it makes for an interesting read. It understands some kind of cipher is being used immediately, but it then attempts a number of techniques (including the classic frequency count for each letter and mapping that to frequencies in standard English), and breaking down the words in various ways.

*I've seen claims that there is little difference between o1's performance and Claude's, which I find jarring. As a physicist, I've had o1-preview produce decent answers to a couple of mini-sized research questions I've had this past month, while nothing Claude can produce comes close.

fabim.

I had always assumed that LLMs would just be the interface component, between us and future computational ability. The fact it has a decent grasp on many key aspects is a tick in the box. Counter to the statement on logical reasoning, how urgently is it needed; pairing us with an LLM to get / summarise information and we decide ? LLMs ability to come up with variations (some sensible, other not) in the blink of an eye is useful. My colleagues and I value the random nature of suggestions, we can use our expertise to take the best of what it serves up.

gdr

So he uses applied category theory to solve the hard problems of reasoning and generalization without ever mentioning the duo "category theory" (not to scare investors or researchers with abstract nonsense). I like this a lot. What he proposes corresponds to "borrowing arrows" that lead to accurate out-of-distribution predictions, as well as finding functors (or arrows between categories) and natural transformations (arrows between functors) to solve problems.

FamilyYoutubeTV-xd

Excellent speech Fraancois Chollet never disappoints me. You can see the mentioned " logical breaking points" in every LLM nowdays including o1 (which is a group of fne tuned LLMs). If you look closely all the results are memorized patterns even o1 has some strange "reasoning" going on where you can see "ok he got the result right but he doesn't get why the result is right" I think this is partly the reason why they don't show the "reasoning steps". This implies that these systems are not ready to be employed on important tasks without supervised by a human who knows how the result should look and therefore are only usable on entry level tasks on narrow result fields (like an entry level programmer).

szebike

The only talk that dares to mention the 30, 000 human laborers ferociously fine-tuning the LLMs behind the scenes after training and fixing mistakes as dumb as "2 + 2 = 5" and "There are two Rs in the word Strawberry"

BinoForLyfe

This is a guy who's going to be among authors/contributors of AGI.

l.halawani

Back-to-back banger episodes! Ya'll are on a roll!

thedededeity

While it's crucial to train AI to generalize and become information-efficient like the human brain, I think we often forget that humans got there thanks to infinitely more data than what AI models are exposed to today. We didn't start gathering information and learning from birth—our brains are built on billions of years of data encoded in our genes through evolution. So, in a way, we’ve had a massive head start, with evolution doing a lot of the heavy lifting long before we were even born

whoami

It's Not About Scale, It's About Abstraction

That's Not A Fish... (Animation Meme) @JustjulesOG-r4q #funny #cat #animation #meme

Why The Major Scale Is So Important (It's More Than Just A Scale)

It's Not Fat; It's Muscle. 💪 #weightgain #fitness

The Incomprehensible Scale of 52!

What Does It REALLY Mean To Do Things That Don't Scale? – Dalton Caldwell and Michael Seibel

Do NOT Mess With Them! (Animation Meme) @Cellthegamer #funny #cat #animation #meme

He pulled the reverse uno card

Scales of Measurement - Nominal, Ordinal, Interval, & Ratio Scale Data

Ninja H2R ko ek aam vyakti kyon nahin khareed Sakta #shorts

Connor Murphy Comes Clean || The Natty or Not SCALE Explained

2 Trillion Galaxies? The Mind-Boggling Scale of the Universe Explained

Digital Scale Calibration

Solar System Scale Stickers #shorts

Were they honest about the size? 👀

Why Doesn't The Scale Increase?

Scale and Mapwork

SCALE in English: Meaning, Examples, and Pronunciation for IT Professionals 📈🛠️🌐

Don't Buy an EXPENSIVE Body Fat Scale Until You Watch This

I Just Wanted Some Ice Cream

The Real Scale of The Universe😲!(w/Brian Cox)

Do Smart Scales Measure Body Fat Percentage Accurately? Best Smart Scale 2020

Things That Don't Scale, The Software Edition – Dalton Caldwell and Michael Seibel

How to Scale Your Business

Easy A Minor Scale Trick Shows How Good You Play Guitar!