Provably Convergent Actor-Critic for MARL through Risk-aversion

ArXi:2602.12386v2 Announce Type: replace-cross Learning stationary policies in infinite-horizon general-sum Marko games (MGs) remains a fundamental open problem in Multi-Agent Reinforcement Learning (MARL). While stationary strategies are preferred for their practicality, computing stationary forms of classic game-theoretic equilibria is computationally intractable -- a stark contrast to the comparative ease of solving single-agent RL or zero-sum games.