Practical and Optimal Algorithm for Linear Contextual Bandits with Rare Parameter Updates

ArXi:2606.00984v1 Announce Type: cross We study linear contextual bandits under rare parameter updates: the learner may incorporate reward feedback into its parameter estimate only at a small number of update times, while still observing contexts online and selecting actions sequentially. This viewpoint clarifies a practical distinction that is often blurred in the literature: many "strictly batched" methods. additionally.