AI RESEARCH

Towards Sparse Video Understanding and Reasoning

arXiv CS.LG

ArXi:2602.13602v2 Announce Type: replace-cross We present \revise (\underline{Re}asoning with \underline{Vi}deo \underline{S}parsity), a multi-round agent for video question answering (VQA). Instead of uniformly sampling frames, \revise selects a small set of informative frames, maintains a summary-as-state across rounds, and stops early when confident. It s