AI RESEARCH
How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness
arXiv CS.LG
•
ArXi:2605.23628v1 Announce Type: new Multi-task benchmarks have become a central pillar of machine learning research, yet their growing influence has incentivised benchmark gaming -- strategic actions taken to improve the leaderboard rank of a specific model. Treating datasets as voters and models as candidates, we consider benchmark-specific