Next-Latent Prediction Transformers Learn Compact World Models

ArXi:2511.05963v2 Announce Type: replace Transformers replace recurrence with a memory that grows with sequence length and self-attention that enables ad-hoc lookups over past tokens. Consequently, they lack an inherent incentive to compress history into compact latent states with consistent transition rules. This often leads to learning solutions that generalize poorly. We