AI RESEARCH

Do Transformers Need Three Projections? Systematic Study of QKV Variants

arXiv CS.AI

ArXi:2606.04032v1 Announce Type: cross Transformers have become the standard solution for various AI tasks, with the query, key, and value (QKV) attention formulation playing a central role. However, the individual contribution of these three projections and the impact of omitting some remain poorly understood. We systematically evaluate three projection sharing constraints: a) Q-K=V (shared key-value), b) Q=K-V (shared query-key), and c) Q=K=V (single projection