Commit to the Bit: Reactive Reinforcement Learning Done Right

ArXi:2605.28276v1 Announce Type: new Reinforcement learning algorithms are commonly analyzed (and designed) under the Marko assumption. This is unrealistic, as most environments encountered in practice are either partially observable, or require function approximation that restricts the agent to access non-Markovian state features. We consider the problem of learning an optimal reactive policy in a finite environment with deterministic observations (or equivalently, hard state aggregation). We.