Don't Read Everything: A Curvature-Conditioned Query for Linear Attention

ArXi:2606.01294v1 Announce Type: cross Linear attention reduces the quadratic cost of softmax attention by maintaining a recurrent fast-weight state, but it consistently lags on in-context retrieval and long-context tasks. Existing remedies act on the write side of memory through gating, delta updates, or kernel feature maps, but the read step is left unchanged: every past key contributes additively to the output, so useful targets are diluted by the bulk of d vectors. We borrow one specific piece of softmax's geometry to construct a cheap read-time contraction of the query.