Are we chasing ghosts? Quantifying unattributable polarization, and attributing the rest to annotator groups

ArXi:2602.06055v2 Announce Type: replace Standard agreement metrics often fail to capture systematic differences in opinion between minority and majority-group annotators, jeopardizing tasks such as hate speech and toxicity detection. Polarization has recently been proposed as a robust way of distinguishing minor disagreements from systematic differences in opinion, but existing approaches do not provide practical tools for attributing it to specific annotator groups.