Steering Language Models Before They Speak: Logit-Level Interventions

ArXi:2601.10960v2 Announce Type: replace-cross Controllable generation requires language models to realize output characteristics such as reading level, politeness, and toxicity. Existing steering methods are often indirect, require access to internal activations, or depend on auxiliary trained models. We propose SWAI, a