AI RESEARCH

CRAB-Bench: Evaluating LLM Agents under Complex Task Dependencies and Human-aligned User Simulation

arXiv CS.CL

ArXi:2606.01815v1 Announce Type: new Evaluating LLM agents in realistic service scenarios requires complex task dependencies, imperfect user behavior, and an evaluation that accommodates multiple valid solutions. We