AI RESEARCH

Temporal Simultaneity Predicts Annotation Quality in Sentiment Corpora

arXiv CS.CL

ArXi:2605.27239v1 Announce Type: new Annotation quality is difficult to sustain when campaigns span weeks or months with small annotator pools. We present a Setswana sentiment dataset of 3,565 tweets annotated by three native-speaker annotators across eight batches and examine why inter-annotator agreement (IAA) declines over time. Despite an aggregate Randolph's free-marginal Kappa of $\kappa = 0.76$, "excellent," per-batch $\kappa$ falls by than 32 points across the annotation task.