Why Latent Actions Fail, and How to Prevent It

ArXi:2605.20223v1 Announce Type: new Latent action models (LAMs) aim to learn action-like representations from unlabeled videos by compressing frame-to-frame changes. The frames of in-the-wild videos, however, contain not only the agent's own state but exogenous state such as background clutter. Since the exogenous state