AI RESEARCH

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories

arXiv CS.AI

ArXi:2605.29253v1 Announce Type: new Task success can hide process anomalies in real-world agent executions. An agent may pass the final task oracle while still accumulating unresolved ambiguity, unsafe external writes, ignored errors, weakly grounded commitments, or capability-boundary overcommitment. We study this mismatch as the Outcome-Process Gap and