filmov
tv
[QA] Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Показать описание
This paper analyzes preference alignment and jailbreaking in large language models, proposing E-RLHF as a cost-effective method to enhance safety without compromising performance.