Scaling, Benchmarking, and Reasoning of Vision-Language Agents for Mobile GUI Navigation

ArXi:2605.27134v1 Announce Type: new Vision-Language Models (VLMs) have shown rapid progress in mobile GUI navigation. This paper presents a systematic study of data scaling, benchmarking, and reasoning for VLM-based agents in this domain. To facilitate rigorous evaluation, we