SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

ArXi:2605.21384v1 Announce Type: cross As long-horizon coding agents produce code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal.