PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting

ArXi:2605.25678v1 Announce Type: cross We study the problem of multiclass PAC learning with bandit feedback in the realizable setting. In this framework, there is an unknown data distribution over an instance space $\mathcal{X}$ and a label space $\mathcal{Y}$, as in classical multiclass PAC learning, but the learner does not observe the labels of the i.i.d