Enhancing Trustworthy GUI Grounding via Self-Critiqued Reinforcement Learning

ArXi:2510.27266v2 Announce Type: replace Autonomous graphical user interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement learning (RL), often provide confidence signals that are poorly aligned with actual grounding correctness, leading to overconfident and unreliable predictions.