Continuous Reasoning for Vision-Language-Action

ArXi:2606.00229v1 Announce Type: cross Natural language is a powerful reasoning medium for language and vision-language models, but it is mismatched to the granularity of continuous control. Text and explicit subgoals operate at task-level granularity, whereas vision-language-action (VLA) policies must choose actions at a much finer temporal scale; a single reasoning step can. therefore. span many action chunks while remaining only weakly coupled to the action needed now.