MIT EI seminar, Hyung Won Chung from OpenAI. 'Don't teach. Incentivize.'

preview_player
Показать описание
I made this talk last year, when I was thinking about a paradigm shift. This delayed posting is timely as we just released o1, which I believe is a new paradigm. It's a good time to zoom out for high level thinking

I titled the talk “Don’t teach. Incentivize”. We can’t enumerate every single skill we want from an AGI system because there are just too many of them. In my view, the only feasible way is to incentivize the model such that general skills emerge.

I use next token prediction as a running example, interpreting it as a weak incentive structure. It is a massive multitask learning that incentivizes the model to learn a smaller number of general skills to solve trillions of tasks, as opposed to dealing with them individually

If you try to solve tens of tasks with minimal effort possible, then pattern-recognizing each task separately might be easiest.

If you try to solve trillions of tasks, it might be easier to solve them by learning generalizable skills, e.g. language understanding, reasoning.

An analogy I used is extending the old saying: "Give a man a fish, you feed him for a day. Teach him how to fish, you feed him for a lifetime." I go one step further and solve this task with an incentive-based method: "Teach him the taste of fish and make him hungry."

Then he will go out to learn to fish. In doing so, he will learn other skills, such as being patient, learning to read weather, learn about the fish, etc. Some of these skills are general and can be applied to other tasks.

You might think that it takes too long to teach via the incentive instead of direct teaching. That is true for humans, but for machines, we can give more compute to shorten the time. In fact, I'd say this "slower" method allows us to put in more compute.

This has interesting implications for generalist vs specialist tradeoff. Such tradeoff exist for humans because time spent on specializing a topic is time not spent on generalizing. For machines, that doesn’t apply. Some models get to enjoy 10000x more compute.

Another analogy is “Room of spirit and time” from Dragon ball. You train one year inside the room and it is only a day outside. The multiplier is 365. For machines it is a lot higher. So a strong generalist with more compute is often better at special domains than specialists.

I hope this lecture sparks interest in high level thinking, which will be useful in building better perspectives. This in turn will lead to finding more impactful problems to solve.
Рекомендации по теме
Комментарии
Автор

"teach him taste of fish and make him hungry"🤣

hengfun
Автор

I like your concept of scaling:
1) identify the modeling assumption or inductive biases that bottlenecks further scaling
2) replace it with a more scalable one.
Example: letting the model learn it's own representations is a more available approach.

JimSlattery
Автор

Thanks for the insightful talk!
I love the clarity at 18:50 of seeing the LLM going through training, with so many skills implicitly demanded by the next token prediction task.

JimSlattery
Автор

Somehow Hyung Won Chung's talk is always very abstract and purely at a meta level. He doesn't talk about specific LLM techniques or anything like that, but goes all into the fundamental intuition behind the scaling 👍

windmaple
Автор

형원님, 정말 멋지십니다. 구독하고 항상 응원할게요!! 앞으로도 많은 영상 부탁드려요

honon-cswl
Автор

Great point: The Bitter Lesson article is the single most important writing in the field of AI. 😳

JimSlattery
Автор

"No amount of bananas can incentivize monkeys to do mathematical reasoning" lol

chenjus
Автор

One of the best talks on YouTube right now!

wwkk
Автор

좋은 강의 감사합니다. 끊임없는 배움의 해체(unlearning)에 대해 이야기하신 것이 세상을 보는 눈을 트이게 한 느낌이네요. 잘못된 공리를 바탕으로 구축된 직관과 아이디어가 해체되어야 한다는 이야기를 크게 생각해본 적이 없었으니까요.

SONJOGYO
Автор

such an incredibly information dense talk, thank you!

vrushankdesai
Автор

멋있어요! 일이 많아서 힘드실거라고 생각이 듭니다. 건강도 챙기시길 바래요

조바이든-rr
Автор

yep - we should follow this principle in architecture too!

-mwolf
Автор

referenced talk:
John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges

andrewthomas
Автор

It is suprised to know that you majored in mechanical engineering of your phd degree. How can you make such a big move?

hsuai
Автор

"MIT EI seminar" reads like "With egg seminar" when you are German, super weird title.... Is it about breakfast? 😂

madrooky