EMA-Nesterov: Stabilizing Nesterov's Lookahead for Accelerated Deep Learning Optimization

ArXi:2605.25395v1 Announce Type: new Lookahead-based acceleration methods, such as Nestero's momentum, are widely used in optimization, but they often become unreliable in mainly due to stochastic gradient noise and non-convex loss landscapes. In particular, standard lookahead relies on short-horizon update signals (e.g., differences between consecutive iterates), which are inherently noisy and can lead to unstable extrapolation directions.