AI RESEARCH
CRAB-Bench: Evaluating LLM Agents under Complex Task Dependencies and Human-aligned User Simulation
arXiv CS.CL
•
ArXi:2606.01815v1 Announce Type: new Evaluating LLM agents in realistic service scenarios requires complex task dependencies, imperfect user behavior, and an evaluation that accommodates multiple valid solutions. We