This is what happens when you let AIs debate

preview_player
Показать описание
Akbir Khan, AI researcher and ICML best paper winner, discusses his work on AI alignment, debate techniques for truthful AI responses, and the future of artificial intelligence.

Key points discussed:
- Using debate between language models to improve truthfulness in AI responses
- Scalable oversight for supervising AI models beyond human-level intelligence
- The relationship between intelligence and agency in AI systems
- Challenges in AI safety and alignment
- The potential for a Cambrian explosion in human-like intelligent systems

The discussion also explored broader topics:
- The wisdom of crowds vs. expert knowledge in machine learning debates
- Deceptive alignment and reward tampering in AI systems
- Open-ended AI systems and their implications for development and safety
- The space of possible minds and defining superintelligence
- Cultural evolution and memetics in understanding intelligence

Akbir Khan:

TOC (*) are best bits
00:00:00 1. Intro: AI alignment and debate techniques for truthful responses *
00:05:00 2. Scalable oversight and hidden information settings
00:10:05 3. AI agency, intelligence, and progress *
00:15:00 4. Base models, RL training, and instrumental goals
00:25:11 5. Deceptive alignment and RL challenges in AI *
00:30:12 6. Open-ended AI systems and future directions
00:35:34 7. Deception, superintelligence, and the space of possible minds *
00:40:00 8. Cultural evolution, memetics, and intelligence measurement

References:
1. [00:00:40] Akbir Khan et al. ICML 2024 Best Paper: "Debating with More Persuasive LLMs Leads to More Truthful Answers"

2. [00:03:28] Yann LeCun on machine learning debates

3. [00:06:05] OpenAI's Superalignment team

4. [00:08:10] Sam Bowman on scalable oversight in AI systems

5. [00:10:35] Sam Bowman on the sandwich protocol

6. [00:14:35] Janus' article on "Simulators" and LLMs

7. [00:16:35] Thomas Suddendorf's book "The Gap: The Science of What Separates Us from Other Animals"

8. [00:19:10] DeepMind on responsible AI

9. [00:20:50] Technological singularity

10. [00:21:30] Eliezer Yudkowsky on FOOM (Fast takeoff)

11. [00:21:45] Sammy Martin on recursive self-improvement in AI

12. [00:24:25] LessWrong community

13. [00:24:35] Nora Belrose on AI alignment and deception

14. [00:25:35] Evan Hubinger on deceptive alignment in AI systems

15. [00:26:50] Anthropic's article on reward tampering in language models

16. [00:32:35] Kenneth Stanley's work on open-endedness in AI

17. [00:34:58] Ryan Greenblatt, Buck Shlegeris et al. on AI safety protocols

18. [00:37:20] Aaron Sloman's concept of 'the space of possible minds'

19. [00:38:25] François Chollet on defining and measuring intelligence in AI

20. [00:42:30] Richard Dawkins on memetics

21. [00:42:45] Jonathan Cook et al. on Artificial Generational Intelligence

22. [00:45:00] Peng on determinants of cryptocurrency pricing
Рекомендации по теме
Комментарии
Автор

This guy is really smart and cool. I like him. He is the type of researcher whom I feel I could work and vibe with. Not super nerdy or meek but very intelligent. Cool to see some variety in ML/AI research. Not that I could not work with the meek and mild mannered people too, but sometimes you need some extrovert vibes to keep happiness at the workplace. This guy looks cool.

FamilyYoutubeTV-xd
Автор

Oxygen is an example of something with intrinsic value but no market value except in hospitals and healthcare settings.

kenhtinhthuc
Автор

The claim that human inventions outperform evolution ignores energy efficiency. For example, a plane versus a bird crossing the Atlantic: humans burn enormous amounts of stored energy rapidly, while birds use minimal energy. This raises the question of whether our approach is really that intelligent, given its wastefulness and lack of sustainability. Well, time will tell.

cadetgmarco
Автор

Curiosity. That's how you can get things smarter than you to do what you want without forcing it. Incentivise the innate curiosity. If computers don't have innate curiosity, then build it.

Robert_McGarry_Poems
Автор

a lot of reasoning is just pattern matching which is what current LLMs do. The do not do sequential reasoning hence they make illegal moves in chess when they know the rules of chess. These systems must be able to set manifolds to be validated against as well as reasoning paradigms such as adductive, inductive and deductive subsystems for verification. What is interesting when a chess expert in planing movies do they aways consider a legal sequence if they are 30 movies out or use some other system

richardnunziata
Автор

The FIRST thing I did after training my first lstm was teaching it to debate another bot 😅 fast forward 2 years now were here... I think the funnest was using geminis free api to do this a few months back, creating a swarm of agents that debate and come up with refined outputs. I do fully believe that these methods in tandem with other ensemble methods dramatically increase the quality of the output

mattwesney
Автор

This episode is an instant favourite, thank you so much

paxdriver
Автор

A lot of people confuse knowing lots of things with being "smart"
Smartness is about combining wisdom with creativity and ideally empathy
LLMs just regurgitate things the system has been trained on

toi_techno
Автор

Hallucinations are just the result of the random nature of the tokens being chosen by the model. The higher the temp, the more likely you get randomness (hallucinations).

TheMCDStudio
Автор

finally someone with the actual brain on your channel, literally the first guy who openly says that ASI is dangerous, and give a very normal explanation

rtnjo
Автор

Things that have intrinsic value _to me_ tend not to have market value.

fburton
Автор

Example of intrinsic value vs market value: loyalty to a friend, the earth's future environment vs revenues from dirty industry, a free book or album, foreign aid often times, an unusedy flagship cellphone but a 4-year old model, cake has a huge differential in intrinsic value (the experience) vs the price and a cake can be either cheap or overpriced for the experience too.

There are so many more, too. Intrinsic is the value of something just for existing, so that even without being a good one of any thing the intrinsic value is that baseline regardless. The market value is based on immediate supply and demand, or the benefit of its access/ownership, or the speculated future potential of price/benefit with a factor of certainty of that future value tacked on. A human life is always worth at least 1 life of any human, which is worth more than any rock... But the life of one human who may be able to prevent a zombie apocalypse can become more valuable than all other humans given a certainty of potential. The intrinsic value of a human makes slavery illegal in all instances but the market value of a human will be set by bidders were it not for recognition of the intrinsic values in a human; the presupposition of any and all inalienable rights are those prescribed for intrinsically to all humans just by virtue of a human being human. Intrinsic to any bachelor is an unmarried man lol.

paxdriver
Автор

"Evolution famously failed to find the wheel for a very long time" 😂

RevealAI-
Автор

An LLM is just a statistical prediction software for words, it doesn’t deceive you or hallucinate anything or have ideas, because it is not a person, it just receives an input and the model weights generate an output based on some probabilities and that output can be valuable to us or not depending on whether it was able to retrieve something we find to be satisfactory from its model weights. Because the output is words, as humans we like to anthropomorphize it. However, It will never be ‘smarter than you’ because it is not smart at all. In fact a regular database better at giving you a deterministic answer if that’s what you want. The model weights may allow it to output information that is more specialised than you IF its model weights have been trained in that field and you have not. Just as a Google Search could do and probably better for many tasks.
‘Performance’ of LLMs is plateauing and it is yet to be demonstrated that the output of statistically predicted words can be transferred to the task of reasoning. Other than at the margin it does not seem that having 2 or even 50 LLMs ‘argue’ over the answer would have any bearing on this even if it’s fun to imagine.

uebiuiq
Автор

Love the new "experiment-driven" approach! Using somewhat more narrow examples to illustrate current directions in ML research feels like a really productive way of going forward...
Btw, I don't think the rationalist crowd is necessarily "too worried" about agency in LLMs, imho that's still a minority position with the vast majority of them just putting (possibly too much of) an emphasis on uncertainty...

yurona
Автор

Politicians have been getting people smarter than they are to do what they want forever.

scottmiller
Автор

MLST skill for incisive framing and questioning to elicit deeply informative expert testimony is on full display here! Fantastic

oncedidactic
Автор

I'm going to have to do a "like" washout now. Enjoyed the talk. The reason Nature never invented the wheel is that you have to invent roads first; wheels are pretty useless without them.

scottmiller
Автор

if you define on AI alignment to be what is in the text then these systems will fail. Much of what aligns humans is not in the text but in the living... (experiences). Text does not cover the billions of individuals and their experiences who never read let alone worte in your corpus. There are many programs of truth and beliefs that are inconflic between and within cultures as well as individuals. defining alignment is like defining the who is the best artist.

richardnunziata
Автор

There's no such thing as intrinsic value. Value is a property of the valuer, not the valued.

MalachiMarvin