ICML 2024 Tutorial: Physics of Language Models

preview_player
Показать описание

Abstract: We divide "intelligence" into multiple dimensions (like language structures, knowledge, reasoning, etc.). For each dimension, we create synthetic data for LLM pretraining to understand the theory and push the capabilities of LLMs to the extreme.

Unlike benchmarking, by controlling the synthetic data, we aim to discover universal laws of all LLMs, not just a specific version like GPT/Llama. By tweaking hyperparameters such as data amount, type, difficulty, and format, we determine factors affecting LLM performance and suggest improvements.

Unlike black-box training, we develop advanced probing techniques to examine the inner workings of LLMs and understand their hidden mental processes. This helps us gain a deeper understanding of how these AI models function and moves us closer to creating more powerful and transparent AI systems.

This talk will cover language structures (Part 1), reasoning (Part 2), and knowledge (Part 3). These sections explain why and how language models succeed or fail on certain AI tasks and provide practical suggestions for necessary changes to (1) model architecture, (2) data preparation, and (3) the training process to move us closer to AGI.

Timecodes
0:00 - Prelude
11:37 - Part 3: Knowledge
14:49 - Part 3.1: Knowledge Storage and Extraction
25:42 - Summary of Part 3.1
26:46 - Part 3.2: Knowledge Manipulation
35:19 - Summary of Part 3.2
37:00 - Part 3.3: Knowledge Capacity Scaling Laws
49:54 - Summary of Part 3.3
51:26 - Summary of Part 3
53:28 - Part 2.1: Grade-School Math and the Hidden Reasoning Process
1:18:57 - Summary of Part 2.1
1:20:37 - Part 2.2: How to Learn From Mistakes on Grade-School Math Problems
1:31:23 - Summary of Part 2.2
1:32:10 - Summary of Part 2
1:33:22 - Part 1: Hierarchical Language Structures
1:49:23 - Summary of Part 1
Рекомендации по теме
Комментарии
Автор

Very high information density. Each sub section worth an individual paper

XiaoBaBa
Автор

This is crazy. I'm like a low dim creature witnessing high dim creature's thinking process and experimental methods. The testbed is so well chosen that I'll build on this to learn more. Thank you so much.

icriou
Автор

This is the most insightful talk I have ever seen. This has tought me how powerful controlled experiments can be!

manncodes
Автор

Great work! The research questions are simple and the answers are profound and not too complex but they are also exactly what we need to understand how LLMs work. I hope you continue to demystifie the limitations of LLMs and even find technical remedies. One of the highest Signal/Noise pieces i've had the pleasure to enjoy in quite some time!

sucim
Автор

One of the clearest talk about LLM I’ve ever heard

leaderfeng
Автор

Allen, this is such a wonderful talk. So many thanks for putting this online!

sheikhshafayat
Автор

Awesome talk, I am glad it is available now for those who can't make it to ICML in person!

salemon
Автор

Awesome talk! Thorough and detailed investigation into multiple aspects of LLM learning, their behavior.

sacramentofwilderness
Автор

Great talk. Genius level. That said, imho It takes humans around 20 years of education, filled with structured training and guidance, to learn what to remember and prioritize before we can effectively apply the scientific method. This training is continuously updated and refined. Expecting an LLM to reason and make decisions without specific training seems unrealistic. The issues with AI reasoning are genuine, but the need for training is normal and essential. Once properly trained, using expensive COT or any new or future methodology, then I would hope that an LLM could perform tasks at a much faster rate, but this training is crucial because knowledge and reasoning, whether in humans or LLMs, don't come as a magical solution without effort and guidance. I'm no specialist, so take what I say with a grain of salt or a bucket. Again, great talk. ty

casual.dojo.
Автор

Okay, this was Masterclass level of presentation. Great job Zeyuan!

fdumitru
Автор

Absolutely the best talk I've seen in a long time. This is how you do science!

EnesDeumic
Автор

What a smart guy! His speech is excellent.

Dron
Автор

Thanks a lot for the illuminating and inspiring research! And of course for extremely dense and entertaining presentation!

vorushin
Автор

Awesome talk! The best talk I have seen! And this is my first time to leave messages on youtube...

spoiled
Автор

can anyone explain why pretrain on bio + fine tune in QA is so different from mixed pre training on bio + QA? what if the fine tune was a full fine tune updating all weights? why would that be any different from including QA in pre training? (re 26:21)

PatrickvanNieuwenhuizen
Автор

That's a terrible [Back] amazing talk!

fangzhangmnm
Автор

Good job! You explained it in a very clear way! One of the best talks i have watched recently

hahiZY
Автор

Fantastic talk. This is the kind correct combination of hypothesized but backed up with disciplined investigative research that I love to see.

islandfireballkill
Автор

Thank you very much for posting this talk!

anshitasaxena
Автор

excellent talk, great demonstration of how llm works, and more importantly how research works in general, 清流in the hype

haoliang