AI RESEARCH

Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval

arXiv CS.CL

ArXi:2605.23826v1 Announce Type: cross Keyframe selection is a direct way to provide verifiable visual evidence for long-video question answering (QA). Queries differ in what they require, and finding the right frames depends on knowing what to look for. Existing keyframe selectors either score every frame against a single query, or decompose the query into a fixed schema evaluated by a single visual tool.