filmov
tv
[QA] Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Показать описание
The paper investigates extreme-token phenomena in transformer-based LLMs, revealing mechanisms behind attention sinks and proposing strategies to mitigate their impact during pretraining.