Why We Need World Models for AGI: Where LLMs Fail and How World Models May Outperform

ArXi:2605.23972v1 Announce Type: new Large language models achieve strong performance in language generation and knowledge-intensive tasks, yet remain limited in settings requiring causal reasoning, persistent state tracking, and long-horizon planning. We argue that these limitations may arise from an objective-level mismatch between sequence prediction and reasoning over latent environment dynamics. To formalize this distinction, we