AI RESEARCH

Cost of Structural Learning Under Censored Feedback: A Threshold-Bandit Approach

arXiv CS.LG

ArXi:2605.27076v1 Announce Type: cross In many multi-agent applications, tasks yield rewards only when executed by a coalition meeting an unknown size threshold; otherwise, feedback is fully censored. This censorship creates an identifiability problem: agents cannot distinguish stochastic failure from insufficient coordination. We formalize this setting as the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and analyze it under both centralized and decentralized coordination.