Predicting Inference-Time Scaling Gains from Labeled Validation-Set Output Statistics

ArXi:2606.02981v1 Announce Type: new Best-of-$N$ inference scaling (drawing $N$ candidate answers from a language model and returning the one a reward model ranks highest) improves accuracy by an amount that varies across models, but predicting that amount in advance currently requires running the procedure end-to-end.