AI RESEARCH
Diffusion-Augmented Markov Decision Processes for Maximum Entropy Reinforcement Learning
arXiv CS.AI
•
ArXi:2512.02019v3 Announce Type: replace-cross Diffusion models excel at sampling from complex, unnormalized distributions. In this work, we extend Maximum Entropy Reinforcement Learning (ME-RL) to diffusion processes, enabling sampling from the optimal policy trajectory distribution. By minimizing a tractable upper bound on the reverse KL divergence between the diffusion policy and the optimal policy trajectory distributions, we derive a modified surrogate objective and