AI RESEARCH

MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training

arXiv CS.LG

ArXi:2605.26842v1 Announce Type: new The Muon optimizer has recently offered a promising alternative to AdamW for large language model