AI RESEARCH

Move on Muon : A Hamiltonian probability gradient flow perspective of Muon optimizer

arXiv CS.LG

ArXi:2605.23871v1 Announce Type: cross We develop a gradient flow on the space of probability measures defined on matrix-valued parameters induced by regularized Muon, an analytically smoothed version of the idealized Muon optimizer. The key observation is that the regularized orthogonalization map is the gradient of a smooth Fenchel-dual smoothing of the nuclear norm. This identifies the (regularized) Muon update as a mirror/prox step in the update variable, with momentum acting as the dual coordinate.