AI RESEARCH
Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents
arXiv CS.AI
•
ArXi:2606.00096v1 Announce Type: cross Visual agents employ external visual tools within visual chains of thought to incorporate fine-grained evidence. While prior work has mainly studied these tools in visual search tasks, their role in complex visual reasoning remains underexplored. In this paper, we move beyond simple visual search tasks to investigate challenging tasks, including 3D spatial reasoning and medical visual question answering, where agents must integrate tool-acquired local evidence with the global context.