AI RESEARCH
The Implicit Bias of Adam and Muon on Smooth Homogeneous Neural Networks
arXiv CS.LG
•
ArXi:2602.16340v3 Announce Type: replace We study the implicit bias of momentum-based optimizers on smooth homogeneous models. We show that \textit{momentum steepest descent} algorithms like Muon (spectral norm), MomentumGD ($\ell_2$ norm), and Signum ($\ell_\infty$ norm) are \textit{approximate} steepest descent trajectories under a decaying learning rate schedule, proving that these algorithms have a bias towards KKT points of the corresponding margin maximization problem.