AI RESEARCH
Outer-Momentum Restarting in High-Dimensional Two-Phase Optimization
arXiv CS.LG
•
ArXi:2605.28585v1 Announce Type: new Communication-efficient distributed optimizers such as DiLoCo reduce synchronization costs by letting workers perform many local updates before aggregating their progress with an outer momentum optimizer. Recent theory suggests that the outer optimizer acts on an effective spectrum induced by the inner optimization loop, and that the choice of outer momentum controls how progress from local updates is accumulated across communication rounds.