ProgVLA: Progress-Aware Robot Manipulation Skill Learning

ArXi:2605.28231v1 Announce Type: cross We present ProgVLA, a compact vision-language-action (VLA) model designed for reliable robot manipulation under tight compute and memory budgets. The model specifically focuses on efficiently processing long multi-modal sequences by maintaining an explicit representation of task progress over extended horizons. To this end, ProgVLA integrates two key components.