VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation

ArXi:2602.07399v2 Announce Type: replace Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce nstrations remains unreliable. While fine-tuned VLA policies often produce semantically plausible trajectories, failures often arise from unresolved geometric ambiguities, where near-miss actions lead to divergent execution outcomes under limited supervision.