Fairness in two-player zero-sum games with bandit feedback

ArXi:2606.01159v1 Announce Type: new We study two-player zero-sum games (TPZSGs) with bandit feedback under fairness constraints requiring every action to be played with probability at least $\alpha/m$. Existing instance-dependent results target $\textit{pure}$ Nash equilibria, while fairness generically produces $\textit{mixed}$ equilibria, a harder learning target.