The Same AI Model Can Perform 6x Better: Here's Why

Dev.to AI
Generative AI

A Stanford and Tsinghua paper ran a controlled experiment earlier this year. Different harness architecture. The result: a 6x performance gap driven entirely by the system built around the model. Not the model itself. This is not a prompt engineering insight. It is a systems architecture insight, and it changes where developers should invest their time when building agentic systems. The 6x Gap Meta-Harness tested Claude Opus 4.6 across two harness configurations on TerminalBench-2.