AI RESEARCH
HORST: Composing Optimizer Geometries for Sparse Transformer Training
arXiv CS.LG
•
ArXi:2605.21104v1 Announce Type: new Sparsifying transformers remains a fundamental challenge, as standard optimizers fail to simultaneously encourage sparsity and maintain