The ARC Prize 2024 Winning Algorithm

Показать описание

Daniel Franzen and Jan Disselhoff, the "ARChitects" are the official winners (with co-researcher David Hartmann) of the ARC Prize 2024. Filmed at Tufa Labs in Zurich - they revealed how they achieved a remarkable 53.5% accuracy by creatively utilising large language models (LLMs) in new ways. Discover their innovative techniques, including depth-first search for token selection, test-time training, and a novel augmentation-based validation system. Their results were extremely surprising.

SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.

***

Jan Disselhoff

Daniel Franzen

TRANSCRIPT AND BACKGROUND READING:

TOC
1. Solution Architecture and Strategy Overview
[00:00:00] 1.1 Initial Solution Overview and Model Architecture
[00:04:25] 1.2 LLM Capabilities and Dataset Approach
[00:10:51] 1.3 Test-Time Training and Data Augmentation Strategies
[00:14:08] 1.4 Sampling Methods and Search Implementation
[00:17:52] 1.5 ARC vs Language Model Context Comparison

2. LLM Search and Model Implementation
[00:21:53] 2.1 LLM-Guided Search Approaches and Solution Validation
[00:27:04] 2.2 Symmetry Augmentation and Model Architecture
[00:30:11] 2.3 Model Intelligence Characteristics and Performance
[00:37:23] 2.4 Tokenization and Numerical Processing Challenges

3. Advanced Training and Optimization
[00:45:15] 3.1 DFS Token Selection and Probability Thresholds
[00:49:41] 3.2 Model Size and Fine-tuning Performance Trade-offs
[00:53:07] 3.3 LoRA Implementation and Catastrophic Forgetting Prevention
[00:56:10] 3.4 Training Infrastructure and Optimization Experiments
[01:02:34] 3.5 Search Tree Analysis and Entropy Distribution Patterns

REFS
[00:01:05] Winning ARC 2024 solution using 12B param model, Franzen, Disselhoff, Hartmann

[00:03:40] Robustness of analogical reasoning in LLMs, Melanie Mitchell

[00:07:50] Re-ARC dataset generator for ARC task variations, Michael Hodel

[00:15:00] Analysis of search methods in LLMs (greedy, beam, DFS), Chen et al.

[00:16:55] Language model reachability space exploration, University of Toronto

[00:22:30] GPT-4 guided code solutions for ARC tasks, Ryan Greenblatt

[00:41:20] GPT tokenization approach for numbers, OpenAI

[00:46:25] DFS in AI search strategies, Russell & Norvig

[00:53:10] Paper on catastrophic forgetting in neural networks, Kirkpatrick et al.

[00:54:00] LoRA for efficient fine-tuning of LLMs, Hu et al.

[00:57:20] NVIDIA H100 Tensor Core GPU specs, NVIDIA

[01:04:55] Original MCTS in computer Go, Yifan Jin

Machine Learning Street Talk

Рекомендации по теме

Комментарии

Important reminder: It has AGI in the name not because something that can solve it IS AGI, but that something that CAN'T solve it would NOT be AGI.

kylemorris

While impressive, I'm not sure that it really moves the needle on finding generalisable methods. The major gains they made either made the model more specific or was essentially a hand coded method of generating a reward function.

reltnek

"- They (LLMs) are better at discriminating than generating and they are also capable of knowing if they don't known how to get there" 🤔 Really cool guests in the video. LLM is still the engine that solves the Tasks even after it have a new Epistemic world view added.

isajoha

Great work, guys. Great questions, Tim.

_obdo_

Didn't o3 get a higher score? Update: I think it is because o3 is closed-source, hence not qualified for the prize.

tylermoore

Kudos to them, it's a masterclass in finding what works, beating everyone else with efficient engineering. It is totally a "Chomsky's bulldozer" solution though.

luke.perkin.online

what about a variable temperature at runtime? so the model can set its own temperature or other sample parameters according to the current situation. so it can set the temperature high when it needs to generate new ideas and be creative and set it to zero when it needs to output a result like a solution grid where it cant make mistakes.

peterkonrad

Im really curious what the purpose of that company will be after the ARC challenge

opusdei

This brings to mind analogies with how people with aphantasia are nevertheless able to reason verbally about non-verbal/spatial problems. I may be completely off beam, of course.

fburton

AGI system must not trained on human provided data entirely, it must discover its training data on itself and training on itself

PisangGoreng-yt

So the same way we solve games with sparse rewards is how they solved Arc. Awesome!

devmentorlive

Well, LLMs already have 2d circuits that respect new line. Likely due to Ascii art which they all grok

KevinKreger

I would say your use of the word "conquered" is painfully inaccurate. More appropriate would have been: "are doing a little bit better" ...

benhermans

so basically a depth first tree search was the solution???

dadplaysgolf

So they trained on the evaluation data. No wonder they had the highest score. They even said the score was way lower when they didnt train dorectly on the benchmark data.

ZacheryGlass

The ARC Prize 2024 Winning Algorithm

The ARC Prize 2024 Winning Algorithm

ARC PRIZE - Win $1Million to Beat the ARC-AGI benchmark

ARC Prize 2024 University Tour Virtual Event

ARC Prize Version 2 Launch Video!

ARC Prize 2024 University Tour

Announcing ARC Prize

Francois Chollet recommends this method to solve ARC-AGI

Panel discussion on ARC Prize 2024 (Zurich)

Chollet's ARC Challenge + Current Winners

I was not impressed by the ARC-AGI challenge (not actually a test for AGI)

Induction vs. Transduction: The Ultimate Brain Showdown! (ARC Prize 2024 Best Paper Award) #ai

ARC PRIZE: SHAPING THE FUTURE OF AGI EVALUATION

Exploring ARC Prize 2024: Breakthroughs in AGI

ARC Benchmark Origins

OpenAI's o3 Model Surpasses ARC-AGI #openai #ai

François Chollet on OpenAI o-models and ARC

Implications of solving the ARC benchmark

How my MoA will Win 1M $ ! | ARC-AGI Prize

LLMs + Discrete Program Search = 2x Improvement on ARC-AGI

PCT Conference 2024 - Wind Turbine control and ARC prize

ARC Prize 2024 Puzzle 0ca9ddb6

Can You Draw The PERFECT Circle?

Ps5 Pro vs Pc #ps5pro #pc

Combining Induction and Transduction for Abstract Reasoning (ARC Prize 2024 Best Paper Award)