Anthropic just published how they contain Claude agents, including two security incidents they got wrong
r/artificial
•
Generative AI
AI Research
Anthropic dropped a solid engineering post this week about containment across claude.ai, Claude Code, and Cowork. One of the transparent writeups from a major AI lab about what actually broke. The core insight: model-layer defenses are probabilistic and will always have a non-zero miss rate. So the real answer is hard environmental containment, not just safer models.