The Assistant as a Privileged Persona: A canonical reference in cross-persona self-recognition

ArXi:2606.00545v1 Announce Type: new Post-trained language models can recognize their own outputs from a sentence or two out of context. In a companion paper \citep{jack2026twomodes} we showed they can also recognize when they are currently acting on-policy, through the sharp entropy drop of assistant-mode generation. Both signals are tied to the Assistant persona that post-