How Hard is it to Rig a Benchmark? A Social Choice Analysis of Leaderboard Robustness

ArXi:2605.23628v1 Announce Type: new Multi-task benchmarks have become a central pillar of machine learning research, yet their growing influence has incentivised benchmark gaming -- strategic actions taken to improve the leaderboard rank of a specific model. Treating datasets as voters and models as candidates, we consider benchmark-specific