SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

ArXi:2512.04069v2 Announce Type: replace Vision Language Models (VLMs) nstrate strong qualitative visual understanding, but struggle with metrically precise spatial reasoning required for embodied applications. The agentic paradigm promises that VLMs can use a wide variety of tools that could augment these capabilities, such as depth estimators, segmentation models, and pose estimators.