Verifying Agentic Development at Scale (8 minute read)

TLDR AI
Generative AI

Cognition's Ido Pesok shares lessons from building autonomous end-to-end testing into Devin, noting that for the first time, Devin sessions are now triggered asynchronously than interactively, making verified-before-merge results a hard requirement rather than a nicety. Devin's harness gained computer-use tools roughly six months ago, and the breakthrough came when engineers started running 10-20 Devins in parallel, each with its own de server, something impossible on a single laptop.