filmov
tv
This is what happens when you let AIs debate

Показать описание
Akbir Khan, AI researcher and ICML best paper winner, discusses his work on AI alignment, debate techniques for truthful AI responses, and the future of artificial intelligence.
Key points discussed:
- Using debate between language models to improve truthfulness in AI responses
- Scalable oversight for supervising AI models beyond human-level intelligence
- The relationship between intelligence and agency in AI systems
- Challenges in AI safety and alignment
- The potential for a Cambrian explosion in human-like intelligent systems
The discussion also explored broader topics:
- The wisdom of crowds vs. expert knowledge in machine learning debates
- Deceptive alignment and reward tampering in AI systems
- Open-ended AI systems and their implications for development and safety
- The space of possible minds and defining superintelligence
- Cultural evolution and memetics in understanding intelligence
Akbir Khan:
TOC (*) are best bits
00:00:00 1. Intro: AI alignment and debate techniques for truthful responses *
00:05:00 2. Scalable oversight and hidden information settings
00:10:05 3. AI agency, intelligence, and progress *
00:15:00 4. Base models, RL training, and instrumental goals
00:25:11 5. Deceptive alignment and RL challenges in AI *
00:30:12 6. Open-ended AI systems and future directions
00:35:34 7. Deception, superintelligence, and the space of possible minds *
00:40:00 8. Cultural evolution, memetics, and intelligence measurement
References:
1. [00:00:40] Akbir Khan et al. ICML 2024 Best Paper: "Debating with More Persuasive LLMs Leads to More Truthful Answers"
2. [00:03:28] Yann LeCun on machine learning debates
3. [00:06:05] OpenAI's Superalignment team
4. [00:08:10] Sam Bowman on scalable oversight in AI systems
5. [00:10:35] Sam Bowman on the sandwich protocol
6. [00:14:35] Janus' article on "Simulators" and LLMs
7. [00:16:35] Thomas Suddendorf's book "The Gap: The Science of What Separates Us from Other Animals"
8. [00:19:10] DeepMind on responsible AI
9. [00:20:50] Technological singularity
10. [00:21:30] Eliezer Yudkowsky on FOOM (Fast takeoff)
11. [00:21:45] Sammy Martin on recursive self-improvement in AI
12. [00:24:25] LessWrong community
13. [00:24:35] Nora Belrose on AI alignment and deception
14. [00:25:35] Evan Hubinger on deceptive alignment in AI systems
15. [00:26:50] Anthropic's article on reward tampering in language models
16. [00:32:35] Kenneth Stanley's work on open-endedness in AI
17. [00:34:58] Ryan Greenblatt, Buck Shlegeris et al. on AI safety protocols
18. [00:37:20] Aaron Sloman's concept of 'the space of possible minds'
19. [00:38:25] François Chollet on defining and measuring intelligence in AI
20. [00:42:30] Richard Dawkins on memetics
21. [00:42:45] Jonathan Cook et al. on Artificial Generational Intelligence
22. [00:45:00] Peng on determinants of cryptocurrency pricing
Key points discussed:
- Using debate between language models to improve truthfulness in AI responses
- Scalable oversight for supervising AI models beyond human-level intelligence
- The relationship between intelligence and agency in AI systems
- Challenges in AI safety and alignment
- The potential for a Cambrian explosion in human-like intelligent systems
The discussion also explored broader topics:
- The wisdom of crowds vs. expert knowledge in machine learning debates
- Deceptive alignment and reward tampering in AI systems
- Open-ended AI systems and their implications for development and safety
- The space of possible minds and defining superintelligence
- Cultural evolution and memetics in understanding intelligence
Akbir Khan:
TOC (*) are best bits
00:00:00 1. Intro: AI alignment and debate techniques for truthful responses *
00:05:00 2. Scalable oversight and hidden information settings
00:10:05 3. AI agency, intelligence, and progress *
00:15:00 4. Base models, RL training, and instrumental goals
00:25:11 5. Deceptive alignment and RL challenges in AI *
00:30:12 6. Open-ended AI systems and future directions
00:35:34 7. Deception, superintelligence, and the space of possible minds *
00:40:00 8. Cultural evolution, memetics, and intelligence measurement
References:
1. [00:00:40] Akbir Khan et al. ICML 2024 Best Paper: "Debating with More Persuasive LLMs Leads to More Truthful Answers"
2. [00:03:28] Yann LeCun on machine learning debates
3. [00:06:05] OpenAI's Superalignment team
4. [00:08:10] Sam Bowman on scalable oversight in AI systems
5. [00:10:35] Sam Bowman on the sandwich protocol
6. [00:14:35] Janus' article on "Simulators" and LLMs
7. [00:16:35] Thomas Suddendorf's book "The Gap: The Science of What Separates Us from Other Animals"
8. [00:19:10] DeepMind on responsible AI
9. [00:20:50] Technological singularity
10. [00:21:30] Eliezer Yudkowsky on FOOM (Fast takeoff)
11. [00:21:45] Sammy Martin on recursive self-improvement in AI
12. [00:24:25] LessWrong community
13. [00:24:35] Nora Belrose on AI alignment and deception
14. [00:25:35] Evan Hubinger on deceptive alignment in AI systems
15. [00:26:50] Anthropic's article on reward tampering in language models
16. [00:32:35] Kenneth Stanley's work on open-endedness in AI
17. [00:34:58] Ryan Greenblatt, Buck Shlegeris et al. on AI safety protocols
18. [00:37:20] Aaron Sloman's concept of 'the space of possible minds'
19. [00:38:25] François Chollet on defining and measuring intelligence in AI
20. [00:42:30] Richard Dawkins on memetics
21. [00:42:45] Jonathan Cook et al. on Artificial Generational Intelligence
22. [00:45:00] Peng on determinants of cryptocurrency pricing
Комментарии