The attack on AI agents that no security tool catches
r/artificial
•
Generative AI
Been working on AI agent security for a while and the attack that concerns me most barely gets talked about. Not the obvious stuff like “ignore previous instructions.” Those get caught. The scary one is when an attacker spreads the attack across multiple messages. Each message looks totally normal. The model sees nothing suspicious. But by message 8 it’s doing something it absolutely should not be doing. Every security tool I’ve tested evaluates messages one at a time. None of them remember what happened three messages ago. Built Bendex Arc to catch this.