AI RESEARCH

HORST: Composing Optimizer Geometries for Sparse Transformer Training

arXiv CS.LG

ArXi:2605.21104v1 Announce Type: new Sparsifying transformers remains a fundamental challenge, as standard optimizers fail to simultaneously encourage sparsity and maintain