Negligible in Size, Significant in Effect: On Scale Vectors in Large Language Models

ArXi:2605.26895v1 Announce Type: cross Normalization layers in modern large language models (LLMs) consist of a deterministic normalization operation and a learnable scale vector. While the normalization operation has been extensively studied, the scale vector remains poorly understood despite its ubiquitous use. In this work, we present a systematic study of scale vectors in LLMs from the perspectives of expressivity, optimization, and architectural structure.