Minimax-Optimal Policy Regret in Partially Observable Markov Games

ArXi:2606.02363v1 Announce Type: new We study sequential decision-making in partially observable environments against strategic, adaptive opponents, modeled as partially observable Marko games (POMGs). The central challenge is to learn latent dynamics from partial observations while facing an adversary whose behavior depends on the learner's strategy, making standard regret notions inadequate.