AI RESEARCH
Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers
arXiv CS.LG
•
ArXi:2602.10408v2 Announce Type: replace Normalization layers are standard in transformers, but it is not clear whether their sample-dependent computations are necessary throughout both