AI RESEARCH
Online Learning in MDPs with Partially Adversarial Transitions and Losses
arXiv CS.LG
•
ArXi:2602.09474v2 Announce Type: replace We study reinforcement learning in MDPs whose transition function is stochastic at most steps but may behave adversarially at a fixed subset of $\Lambda$ steps per episode. This model captures environments that are stable except at a few vulnerable points. We