AI RESEARCH

A note on convergence of Wasserstein policy optimization

arXiv CS.LG

ArXi:2605.22622v1 Announce Type: new Wasserstein Policy Optimization (WPO) is a recently proposed reinforcement learning algorithm that leverages Wasserstein gradient flows to optimize stochastic policies in continuous action spaces. Despite its empirical success, the theoretical convergence properties of WPO in environments with continuous state and action spaces have yet to be fully established. In this note, we argue that WPO within the framework of entropy-regularised Marko Decision Processes converges linearly.