Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process

preview_player
Показать описание


Timecodes
0:00 - Prelude
5:32 - Result 1, the iGSM dataset
18:16 - Result 2, accuracy and reasoning-length generalization
23:56 - Result 3, level-0 vs. level-1 reasoning skill
27:56 - Result 4, V-probing technique details
40:11 - Result 5, level-2 reasoning skill
43:44 - Partial summary
44:34 - Result 6, how LLMs make reasoning mistakes
52:12 - Result 7, scaling law for reasoning
54:53 - Result 8, layer-by-layer reasoning
59:53 - Summary
Рекомендации по теме
Комментарии
Автор

Have to give you your flowers. Personally, I have the belief that this is the most important modern series on fundamental insights on language models. I literally keep your series close by at all times 😂. It help solidify so many intuitions I’ve had and seen in my own research. Most importantly, current post-training processes are all sub-optimal. These behaviors must be learnt at pretraining and data engineering informed by curry-Howard theory is the ideal format for LM reasoning and the concept of data views, permutation of the same set of axioms, must be a thing when multi-epoch training is used. We must scale data engineering. A million thanks. You’re a rockstar in my book.

zandrrlife
Автор

Thank you for recording these videos, very inspiring.

lucasbeyer
Автор

The setup is very, very clever. Rarely see this kind of rigorous yet intuitive experimentation in ML/DL these days. Thank you for the lucid explanation.

thsunkid
Автор

Thanks for doing this important work it’s also essential for practitioners!

trummelbummelfriedlich
Автор

Thank you for this video. This is really thought provoking.

ayushthakur
Автор

Super insightful! Probably a Mixture of Depth model can be trained and then the depth activation may be quite illustrative in terms of reasoning with depth.

fengliang
Автор

i need to know what you eat, drink, what time you wake up, what pets you own, which textbooks you studied, vim bindings, glass prescription...
monster series, congrats! i hope meta is allowing you to do more good work

eafdeafdeafd
Автор

Great content, thanks. Keep it coming.

sanesanyo
Автор

very impressive contents, thanks for your sharing!

franciszhou
Автор

This is the best LM talk I have heard ever. Can I ask a quick question? How do you rule out the possibility that the model has done all calculations(level 0 reasoning) before it start CoT? Thank you!

hongyao
Автор

Thx u for explaining your ideas! I really appreciate it

optimaiz
Автор

Do the 1600 problems were manually constructed? (structure graph, DAG, CoT)

andrea-mjce