EvoVid: Temporal-Centric Self-Evolution for Video Large Language Models

ArXi:2605.21931v1 Announce Type: new Recent Video Large Language Models (Video-LLMs) have nstrated strong capabilities in video reasoning through reinforcement learning (RL). However, existing RL pipelines rely heavily on human-annotated tasks and solutions, making them costly to scale and fundamentally constrained by human expertise. Self-evolving frameworks have recently emerged as a promising alternative through autonomous Questioner-Solver self-play.