Policy and World Modeling Co-Training for Language Agents

ArXi:2606.02388v1 Announce Type: cross Reinforcement learning (RL) improves large language model (LLM) agents by teaching them which actions lead to high rewards, but provides little supervision on what those actions do to the environment. World modeling (WM) can fill this gap, yet existing approaches often require separate simulators, extra