PROGRESSLM: Towards Progress Reasoning in Vision-Language Models

ArXi:2601.15224v2 Announce Type: replace-cross Estimating task progress requires reasoning over long-horizon dynamics rather than recognizing static visual content. While modern Vision-Language Models (VLMs) excel at describing what is visible, it remains unclear whether they can infer how far a task has progressed from partial observations. To this end, we