Beyond Trajectory Rewards: Step-level Credit Assignment for Agentic Search via Graph Modeling

ArXi:2605.29697v1 Announce Type: new In Agentic Search, trajectory-level outcome rewards fail to quantify the behavioral contributions of individual steps, while existing step-level reward methods typically rely on costly tree sampling. We view world knowledge as a latent world graph and each IS task as search within a latent task graph, where effective steps should make graph progress toward the answer node.