The Yes-Man Syndrome: Benchmarking Abstention in Embodied Robotic Agents

ArXi:2605.20544v1 Announce Type: cross Vision-language models (VLMs) are used as high-level planners for embodied agents, translating natural language instructions and visual observations into action plans. While prior work has studied abstention in LLMs, existing benchmarks are largely text-only and do not capture the perceptual grounding and physical constraints inherent to embodied robotics environments.