Model Unlearning Objectives Vary for Distinct Language Functions

ArXi:2605.26454v1 Announce Type: new Large language models (LLMs) uses different objectives to shape different behaviors, we argue that unlearning methods should be designed for the language function at issue. To study this, we consider two mechanistically distinct unlearning goals, dangerous-knowledge unlearning and toxicity unlearning. For dangerous knowledge, we