Batched Single-Index Global Multi-Armed Bandits with Covariates

ArXi:2503.00565v3 Announce Type: replace-cross The multi-armed bandits (MAB) framework is a widely used approach for sequential decision-making, where a decision-maker selects an arm in each round with the goal of maximizing long-term rewards. In many practical applications, such as personalized medicine and recommendation systems, contextual information is available at the time of decision-making, rewards from different arms are related rather than independent, and feedback is provided in batches.