A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

ArXi:2604.08149v2 Announce Type: replace We consider a linear contextual bandit model where contexts and rewards are governed by a finite hidden Marko chain. We first revisit the simplified model by Nelson, in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts (called beliefs), rather than functions of the hidden states themselves. This simplified model may be handled through a direct reduction to standard linear contextual bandits.