Retaining Suboptimal Actions to Follow Shifting Optima in Multi-Agent Reinforcement Learning

ArXi:2602.17062v2 Announce Type: replace Value decomposition is a core approach for cooperative multi-agent reinforcement learning (MARL). However, existing methods still rely on a single optimal action and struggle to adapt when the underlying value function shifts during