AI RESEARCH

Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models

arXiv CS.AI

ArXi:2605.27997v1 Announce Type: cross Large language models frequently generate toxic, hateful, or harmful content, yet existing mitigation methods rely on costly re