SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents

ArXi:2606.02302v1 Announce Type: cross Autonomous LLM agents increasingly operate in stateful environments where they access tools, files, memory, and external services. While such capabilities enable complex real-world workflows, they also