Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

ArXi:2606.05145v1 Announce Type: cross When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend compute on additional attempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where rollouts help, while others are structural and resist resampling regardless of budget. We propose that failed traces encode recoverability structure: the inference-time signature of which test-time interventions can rescue a given failure.