Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses

ArXi:2606.02373v1 Announce Type: new Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain reliably. We.