Tensor Memory: Fixed-Size Recurrent State for Long-Horizon Transformers

ArXi:2605.27686v1 Announce Type: cross Transformers process images and videos by flattening space and time into long token sequences. While attention and KV caching preserve past features, their memory grows with sequence length and they lack an explicit, persistent spatial state, making long-horizon video understanding and occlusion-sensitive reasoning difficult.