Improving Labeling Consistency with Detailed Constitutional Definitions and AI-Driven Evaluation

ArXi:2605.24247v1 Announce Type: cross Many automated labeling pipelines classify inputs into categories defined by a written specification, content moderation being a prominent use case. Simple category definitions are not detailed enough for labelers to produce the accurate, consistent golden labels these pipelines require. One solution is to write a prescriptive definition that settles enough real boundary cases that labelers cannot disagree with the written interpretation.