Scaling World-Model Reinforcement Learning Through Diffusion Policy Optimization

ArXi:2605.26282v1 Announce Type: new Model-based reinforcement learning (RL) can be effectively ed at scale through the use of world models. However, in practice, scaling such approaches remains fundamentally limited. A commonly recognized challenge is model bias and error compounding, which degrade long-horizon predictions. Beyond these issues, we identify a critical yet underexplored bottleneck: a structural misalignment between search and value learning in existing world model approaches.