GPT-2 Teaches GPT-4: Weak-to-Strong Generalization

preview_player
Показать описание


Patreon Supporters:
- Tsubasa Kato
- Mike Wolf
- Paiva
- Tassilo Neubauer
- MonikerEpsilon
- Alexey Malafeev
- Jack Seroy
- JJ Hepburn
- Max Chiswick
- William Freire
- Edward Huff
- Gunnar Höglund
- Ryan Coppolo
- Cameron Holmes
- Emil Wallner
- Jesse Hoogland
- Jacques Thibodeau
- Vincent Weisser
Рекомендации по теме
Комментарии
Автор

some comments I received that didn't make it into the final cut:

- "for imitation saliency: there are some results in appendix E.3 figure 27 that show that if the strong model could actually imitate the weak model the generalization basically goes away"
- "I would emphasize more that the pretraining leakage problem is about the pretraining data leaking implicit supervision from humans, which sort of breaks the analogy where the only supervision the strong models are supposed to have is from the weak models"
- "i think the zero-shot baseline is an important caveat in terms of these techniques actually being useful"
- "I know you sort of mention this near the end but when watching it the first time I thought you were implying that the chess puzzles and RMs also used the confidence aux loss when they aren't"

TheInsideView
Автор

I made this video because a subscriber called Christopher emailed me saying the Collin Burns episode was one of his favorites and he wanted a video on the weak-to-strong generalization paper

TheInsideView
Автор

Feel like the results in the bar charts are missing a "base strong performance", which would be the results of the strong model before weak supervision. What if the strong model is already quite good at 0-shot on the evaluation tasks, this would help quantify how much is being gained from weak finetuning.

daniellawson
welcome to shbcf.ru