Online Learning in MDPs with Partially Adversarial Transitions and Losses

ArXi:2602.09474v2 Announce Type: replace We study reinforcement learning in MDPs whose transition function is stochastic at most steps but may behave adversarially at a fixed subset of $\Lambda$ steps per episode. This model captures environments that are stable except at a few vulnerable points. We