AI RESEARCH
Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View
arXiv CS.AI
•
ArXi:2606.04405v1 Announce Type: cross Modern Transformer architectures frequently employ normalization mechanisms such as RMSNorm and Query-Key Normalization, making parts of the model approximately scale-invariant with respect to weight magnitudes. In this regime, standard Frobenius-norm weight decay acts purely along the radial direction of the weight space and cannot directly simplify the function represented by the normalized layer.