The Agent's First Day: Benchmarking Learning, Exploration, and Scheduling in the Workplace Scenarios

ArXi:2601.08173v2 Announce Type: replace The rapid evolution of Multi-modal Large Language Models (MLLMs) has advanced workflow automation; however, existing research mainly targets performance upper bounds in static environments, overlooking robustness for stochastic real-world deployment. We identify three key challenges: dynamic task scheduling, active exploration under uncertainty, and continuous learning from experience. To bridge this gap, we