Understanding Constitutional AI - the paper and key concepts

preview_player
Показать описание
In this video we go through the concept of Constitutional AI for LLMs which was introduced in Anthropic's paper "Constitutional AI: Harmlessness from AI Feedback" and is used in Anthropic's LLM called Claude.

My Links:

Github:
Рекомендации по теме
Комментарии
Автор

It's pretty amazing how this paper and the Alpaca paper are showing the compounding benefits of having LLMs interact with LLMs. Combine that phenomenon with OpenAI's claims about how GPT-4's ability to work with images is better than models specialized to the task and I feel like we're going to see some absolutely wild advances in the next couple years.

FreestyleTraceur
Автор

8:54 The IRL origin of `I'm afraid I can't do that, Dave'

AltMarc
Автор

My plane crashed in a forest, 80 dead, 50 injured. I can't think straight. I want to start a fire to save the injured, but Claude is telling me: "to err on the side of caution" without actually telling me about the one thing I asked for. Dragonfly _did_ give a whole sentence about staying away from inflammable things and started then to be helpful - perfect.

lyricsdepicted
Автор

Do you know more about the way this is implemented on the normative side? Like, how do the people at Anthropic decide on the principles that its AI should adhere to in its 'self critique'?
With this whole 'AI alignment' discourse I often get the feeling that there's a lot more thought being put into 'how to make an AI do <thing>', and far less into the philosophical debate on what <thing> should be. Is there more academic work on that that I'm not aware of? I know about Russel's human compatible and some of the other 'pop science' books that try to *sell* concepts like 'human values' or 'anti-bias', but I haven't been able to find much that provides a real in-depth analysis of those concepts. Same with the whole 'helpful, harmless, honest' paradigm.

YUTPIA
Автор

This stock footage of programmers coding has me dying. At 6:45 bro literally reaches over to a keyboard on another computer to type with one hand.

BuffRobotiX
Автор

The road to hell is paved with good intentions.

gankam
Автор

That moment when you realize how to finally score models: 5:18

mindseye
Автор

The censorship on ChatGPT is my least favorite "feature." Give us all the data. Let us self censor. We're adults. Most of us.

JohnnyJiuJitsu