filmov
tv
LLM - Reasoning SOLVED (new research)
Показать описание
Grokking transformers, a technique for infusing transformers also with near-perfect causal reasoning abilities. (Note: Grokking has nothing to do with Musk's AI Grok or Groq Inc. for fast inference.)
Grokking achieves this by enabling transformers to identify hierarchical structures within human sentences. Through extended training, the internal structure of the transformer undergoes a fundamental shift, allowing the formation of specific neural pathways called "generalizing circuits." These circuits are instrumental in efficiently encoding and retrieving knowledge for reasoning tasks. To create grokked transformers, several key elements are needed.
First, extensive training is essential, particularly for complex reasoning tasks that require structured knowledge. Second, the transformer architecture must have an optimal depth, balancing computational efficiency with reasoning performance. Third, a perfectly designed training dataset is crucial. This dataset should incorporate atomic facts and inferred facts, mimicking a formal system of axioms and theorems. Testing grokked transformers involves using out-of-distribution examples, which significantly differ from the training data. This helps assess the transformer's generalization capabilities.
Two tasks where grokked transformers excel are composition, where they outperform traditional methods that rely on external knowledge, and comparison, where they reason about similarities or differences between entities. The ratio of inferred to atomic data, the number of layers in the transformer, and the distribution of data within the training set all influence the grokking performance.
To understand how grokking transformers work, we can leverage techniques like logic lens, which analyzes internal activations to pinpoint which parts are involved in specific reasoning tasks, and causal tracing, which maps causal pathways through the transformer's layers. In conclusion, grokking transformers represent a promising approach to achieving near-perfect causal reasoning in large language models.
By meticulously designing training data, optimizing the architecture, and employing techniques like logic lens and causal tracing, we can unlock the potential of grokked transformers to tackle various reasoning challenges.
All rights w/ authors:
Grokked Transformers are Implicit Reasoners:
A Mechanistic Journey to the Edge of Generalization
#airesearch
#ainews
Grokking achieves this by enabling transformers to identify hierarchical structures within human sentences. Through extended training, the internal structure of the transformer undergoes a fundamental shift, allowing the formation of specific neural pathways called "generalizing circuits." These circuits are instrumental in efficiently encoding and retrieving knowledge for reasoning tasks. To create grokked transformers, several key elements are needed.
First, extensive training is essential, particularly for complex reasoning tasks that require structured knowledge. Second, the transformer architecture must have an optimal depth, balancing computational efficiency with reasoning performance. Third, a perfectly designed training dataset is crucial. This dataset should incorporate atomic facts and inferred facts, mimicking a formal system of axioms and theorems. Testing grokked transformers involves using out-of-distribution examples, which significantly differ from the training data. This helps assess the transformer's generalization capabilities.
Two tasks where grokked transformers excel are composition, where they outperform traditional methods that rely on external knowledge, and comparison, where they reason about similarities or differences between entities. The ratio of inferred to atomic data, the number of layers in the transformer, and the distribution of data within the training set all influence the grokking performance.
To understand how grokking transformers work, we can leverage techniques like logic lens, which analyzes internal activations to pinpoint which parts are involved in specific reasoning tasks, and causal tracing, which maps causal pathways through the transformer's layers. In conclusion, grokking transformers represent a promising approach to achieving near-perfect causal reasoning in large language models.
By meticulously designing training data, optimizing the architecture, and employing techniques like logic lens and causal tracing, we can unlock the potential of grokked transformers to tackle various reasoning challenges.
All rights w/ authors:
Grokked Transformers are Implicit Reasoners:
A Mechanistic Journey to the Edge of Generalization
#airesearch
#ainews
Комментарии