AI RESEARCH

GroupTravelBench: Benchmarking LLM Agents on Multi-Person Travel Planning

arXiv CS.CL

ArXi:2605.25200v1 Announce Type: new Travel planning is a realistic task for evaluating the planning and tool-use abilities of LLM agents. However, existing benchmarks typically assume only a single user, thereby avoiding one of the most challenging aspects of real-world scenarios: an agent's ability to identify and resolve conflicts among multiple users. To address this gap, we