It's Not the Capability: Harness Sensitivity Is Non-Monotone Across LLM Agent Tiers

ArXi:2605.26731v1 Announce Type: new A prevalent assumption in LLM agent deployment holds that structured harnesses universally improve reliability, and that higher-capability models need proportionally less structural guidance -- together implying a monotone inverse relationship between model capability tier and optimal harness complexity. We test this hypothesis through a controlled 432-run experiment crossing six models across four capability tiers with three harness conditions (light, balanced, strict) on HEAT-24, a 24-task synthetic benchmark.