ICML 2024 Tutorial: Physics of Language Models

Показать описание

Abstract: We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand the theory and push the capabilities of LLMs to the extreme.

Unlike benchmarking, by controlling the synthetic data, we aim to discover universal laws of all LLMs, not just a specific version like GPT/Llama. By tweaking hyperparameters such as data amount, type, difficulty, and format, we determine factors affecting LLM performance and suggest improvements.

Unlike black-box training, we develop advanced probing techniques to examine the inner workings of LLMs and understand their hidden mental processes. This helps us gain a deeper understanding of how these AI models function and moves us closer to creating more powerful and transparent AI systems.

This talk will cover language structures (Part 1), reasoning (Part 2), and knowledge (Part 3). These sections explain why and how language models succeed or fail on certain AI tasks and provide practical suggestions for necessary changes to (1) model architecture, (2) data preparation, and (3) the training process to move us closer to AGI.

Timecodes
0:00 - Prelude
11:37 - Part 3: Knowledge
14:49 - Part 3.1: Knowledge Storage and Extraction
25:42 - Summary of Part 3.1
26:46 - Part 3.2: Knowledge Manipulation
35:19 - Summary of Part 3.2
37:00 - Part 3.3: Knowledge Capacity Scaling Laws
49:54 - Summary of Part 3.3
51:26 - Summary of Part 3
53:28 - Part 2.1: Grade-School Math and the Hidden Reasoning Process
1:18:57 - Summary of Part 2.1
1:20:37 - Part 2.2: How to Learn From Mistakes on Grade-School Math Problems
1:31:23 - Summary of Part 2.2
1:32:10 - Summary of Part 2
1:33:22 - Part 1: Hierarchical Language Structures
1:49:23 - Summary of Part 1

Рекомендации по теме

Комментарии

Very high information density. Each sub section worth an individual paper

XiaoBaBa

This is crazy. I'm like a low dim creature witnessing high dim creature's thinking process and experimental methods. The testbed is so well chosen that I'll build on this to learn more. Thank you so much.

icriou

This is the most insightful talk I have ever seen. This has tought me how powerful controlled experiments can be!

manncodes

Great work! The research questions are simple and the answers are profound and not too complex but they are also exactly what we need to understand how LLMs work. I hope you continue to demystifie the limitations of LLMs and even find technical remedies. One of the highest Signal/Noise pieces i've had the pleasure to enjoy in quite some time!

sucim

One of the clearest talk about LLM I’ve ever heard

leaderfeng

Allen, this is such a wonderful talk. So many thanks for putting this online!

sheikhshafayat

Awesome talk, I am glad it is available now for those who can't make it to ICML in person!

salemon

Awesome talk! Thorough and detailed investigation into multiple aspects of LLM learning, their behavior.

sacramentofwilderness

Great talk. Genius level. That said, imho It takes humans around 20 years of education, filled with structured training and guidance, to learn what to remember and prioritize before we can effectively apply the scientific method. This training is continuously updated and refined. Expecting an LLM to reason and make decisions without specific training seems unrealistic. The issues with AI reasoning are genuine, but the need for training is normal and essential. Once properly trained, using expensive COT or any new or future methodology, then I would hope that an LLM could perform tasks at a much faster rate, but this training is crucial because knowledge and reasoning, whether in humans or LLMs, don't come as a magical solution without effort and guidance. I'm no specialist, so take what I say with a grain of salt or a bucket. Again, great talk. ty

casual.dojo.

Okay, this was Masterclass level of presentation. Great job Zeyuan!

fdumitru

Absolutely the best talk I've seen in a long time. This is how you do science!

EnesDeumic

What a smart guy! His speech is excellent.

Dron

Thanks a lot for the illuminating and inspiring research! And of course for extremely dense and entertaining presentation!

vorushin

Awesome talk! The best talk I have seen! And this is my first time to leave messages on youtube...

spoiled

can anyone explain why pretrain on bio + fine tune in QA is so different from mixed pre training on bio + QA? what if the fine tune was a full fine tune updating all weights? why would that be any different from including QA in pre training? (re 26:21)

PatrickvanNieuwenhuizen

That's a terrible [Back] amazing talk!

fangzhangmnm

Good job! You explained it in a very clear way! One of the best talks i have watched recently

hahiZY

Fantastic talk. This is the kind correct combination of hypothesized but backed up with disciplined investigative research that I love to see.

islandfireballkill

Thank you very much for posting this talk!

anshitasaxena

excellent talk, great demonstration of how llm works, and more importantly how research works in general, 清流in the hype

haoliang

ICML 2024 Tutorial: Physics of Language Models

ICML 2024 Tutorial: Physics of Language Models

ICML 2024 Tutorial'Machine Learning on Function spaces #NeuralOperators'

On the Role of LLMs in Planning (ICML 2024 Tutorial)

ICML 2024 - Neuro-Visualizer: A Novel Auto-Encoder-Based Loss Landscape Visualization Method

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

Speaker - ICML 2024 Tutorial'Machine Learning on Function spaces #NeuralOperators'

[ICML 2024] Deep Stochastic Mechanics (DSM)

Gibbs Sampling of Continuous Potentials on a Quantum Computer -- ICML 2024

Physics of Language Models: Part 3.1 + 3.2, Knowledge Storage, Extraction and Manipulation

On the Weight Dynamics of Deep Normalized Networks (ICML 2024)

Physics of Language Models - Extracting Knowledge

ICML 2024 Predicting and Interpreting Energy Barriers of Metallic Glasses

[ICML 2024] Align Your Steps: Optimizing Sampling Schedules in Diffusion Models

ICML 2024 - Quantum Theory and Application of Contextual Optimal Transport

[ICML 2024] Isometric Representation Learning for Disentangled Latent Space of Diffusion Models

How to Make Small Language Models Work. Yejin Choi Presents at Data + AI Summit 2024

ICML 2024 Paper: Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning

The genius of Andrej Karpathy | John Carmack and Lex Fridman

[ICML 2024] Vague Prototype-Oriented Diffusion Model for Multi-Class Anomaly Detection

Summary & Key Outcomes of ICML 2024 ... and beyond the AI trends

[ICML 2024] A New Partial p-Wasserstein-Based Metric For Comparing Distributions

Brief Introduction of Paper 'Path-Guided Particle-based Sampling' (ICML 2024)

Liouville Flow Importance Sampler (ICML 2024)

Self-Correcting Self-Consuming Loops for Generative Model Training (ICML 2024)