AI RESEARCH
Where Does Toxicity Live? Mechanistic Localization and Targeted Suppression in Language Models
arXiv CS.AI
•
ArXi:2605.27997v1 Announce Type: cross Large language models frequently generate toxic, hateful, or harmful content, yet existing mitigation methods rely on costly re