Your Agent Passed Every Test. It's Still Going to Break in Production.
Towards AI
•
Generative AI
AI Research
The continuous evaluation loop is the new CI/CD pipeline. Here’s what that means in practice. The loop is the unit of work. AgentOps: DevOps in the Agentic Era The first time I deployed a non-trivial agent to a customer-facing environment, I did what any engineer with a decade of DevOps muscle memory would do. I wrote unit tests. Wired up a CI pipeline. Ran it through staging. Watched the green checks pile up. Shipped it. Two days later, the agent decided to call the same tool seventeen times in a row before giving up. The tests had all passed. The logs were clean.