Improving Adversarial Robustness of Attribution via Implicit Regularization

ArXi:2605.29983v1 Announce Type: new The adversarial robustness of attributions is a fundamental requirement for reliable explainability in deep learning, yet existing approaches typically rely on computationally expensive explicit regularization. In this work, we show that attribution robustness can arise implicitly from the learning dynamics of standard stochastic gradient descent.