Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning

ArXi:2605.24810v1 Announce Type: cross Off-dynamics offline reinforcement learning seeks to learn a target-domain policy from a large source dataset and a limited target dataset under mismatched transition dynamics. Existing approaches such as reward augmentation and data filtering are constrained to the source dataset and cannot synthesize new target behavior to improve coverage beyond the collected source trajectories.