The SuperActivator Mechanism: Transformers Concentrate Reliable Concept Signals in the Tail

ArXi:2512.05038v2 Announce Type: replace Concept vectors aim to enhance model interpretability by linking internal representations with human-understandable semantics, but their practical utility is often limited by noisy and inconsistent activations. In this work, we uncover the SuperActivator Mechanism: a transformer dynamic that amplifies concept activation gaps, concentrating the most reliable concept evidence into a small set of high-activation tokens.