AI RESEARCH
MONA: Muon Optimizer with Nesterov Acceleration for Scalable Language Model Training
arXiv CS.LG
•
ArXi:2605.26842v1 Announce Type: new The Muon optimizer has recently offered a promising alternative to AdamW for large language model