AI RESEARCH
Beyond Transfer Accuracy: Faithful Circuits for Controlled Low-Resource Adaptation
arXiv CS.AI
•
ArXi:2601.08146v3 Announce Type: replace-cross Existing circuit discovery methods rely on templated tasks with clean counterfactuals, limiting their use on diverse natural text. We adapt Contextual Decomposition for Transformers (CD-T) for unstructured settings via label-balanced activation means and task-directional relevance scoring, enabling counterfactual-free circuit discovery. We leverage these circuits for Circuit-Targeted Supervised Fine-Tuning (CT-SFT), restricting parameter updates to task-relevant heads and LayerNorm.