[QA] Mission Impossible: A Statistical Perspective on Jailbreaking LLMs

preview_player
Показать описание
This paper analyzes preference alignment and jailbreaking in large language models, proposing E-RLHF as a cost-effective method to enhance safety without compromising performance.

Рекомендации по теме