Variance-Adaptive Optimal Algorithm for Reinforcement Learning with Multinomial Logit Function Approximation

ArXi:2605.28364v1 Announce Type: cross Reinforcement learning with multinomial logistic (MNL) function approximation has become an important framework due to its flexibility and broad applicability. While existing studies have established regret guarantees under worst-case analysis, they do not capture how performance depends on the variability of the interaction between the learner and the environment. In this paper, we develop a new theoretical analysis for MNL-based Marko decision processes that yields explicit variance-adaptive regret bounds.