Continuous-Depth Field Theory for Transformer Patching and Mechanistic Interpretability

ArXi:2605.25225v1 Announce Type: cross Mechanistic interpretability often uses activation patching, causal tracing, path patching, and steering directions to reveal behaviorally meaningful directions in Transformer activation space. This paper develops a field-theoretic framework for organizing and predicting such interventions.