Instance-dependent Stochastic Lipschitz bandit

ArXi:2605.29748v1 Announce Type: cross We study the Lipschitz bandit problem, where a learner sequentially maximizes an unknown Lipschitz function $f$ over a domain $\mathcal{X} \subset [0,1]^d$ using noisy pointwise evaluations. Existing regret bounds are either worst-case, scaling as $\tilde{\Theta} \left ( T^{d+1/d+2}\right )$, or adaptive via the zooming dimension $d_z$, yielding $\tilde{\Theta} \left ( T^{d_z+1/d_z+2}\right