VISTA: Vision-Grounded and Physics-Validated Adaptation of UMI data for VLA Training

ArXi:2606.04708v1 Announce Type: cross Universal Manipulation Interface (UMI) enables scalable real-world robot data collection without hardware-specific teleoperation, yet leveraging UMI data to train large-scale Vision-Language-Action (VLA) models remains fundamentally challenging.