Offline Reinforcement Learning with Generative Trajectory Policies

ArXi:2510.11499v2 Announce Type: replace-cross Generative models have emerged as a powerful class of policies for offline reinforcement learning (RL) due to their ability to capture complex, multi-modal behaviors. However, existing methods face a stark trade-off: slow, iterative models like diffusion policies are computationally expensive, while fast, single-step models like consistency policies often suffer from degraded performance. In this paper, we nstrate that it is possible to bridge this gap.