Anthropic just published how they contain Claude agents, including two security incidents they got wrong

r/artificial
Generative AI AI Research

Anthropic dropped a solid engineering post this week about containment across claude.ai, Claude Code, and Cowork. One of the transparent writeups from a major AI lab about what actually broke. The core insight: model-layer defenses are probabilistic and will always have a non-zero miss rate. So the real answer is hard environmental containment, not just safer models.