Tree-Guided Identify-Then-Exploit: A Unified Framework of Best Arm Identification and Regret Minimization for Dueling Bandits

ArXi:2606.01799v1 Announce Type: new We study $N$-armed stochastic dueling bandits under the Condorcet-winner assumption, where three widely adopted objectives are considered: best-arm identification (BAI), weak regret, and strong regret. We propose Tree-Guided Identify-Then-Exploit (TG-ITE), the first unified framework to tackle all these objectives to our knowledge. Without requiring stronger assumptions, we propose a shared tree-guided identification approach to find a high-confidence incumbent within $O(N)$ comparisons.