FineVerify: Scaling Test-Time Compute with Fine-Grained Self-Verification for Agentic Search

ArXi:2606.00660v1 Announce Type: new Agentic search requires language model agents to explore many sources and answer complex information-seeking questions. Scaling test-time compute is a promising way to improve these agents, but current approaches can fail, because correct answers are often sparse and score-based selection depends on model calibration.