Reason-Then-Retrieve for CoVR-R with Structured Edit Prompts and Dense-Sparse Fusion

ArXi:2606.02450v1 Announce Type: new CoVR-R studies reason-aware composed video retrieval: given a reference video and an edit instruction, the system must retrieve the target video that satisfies the edit. The main difficulty is that the target is not described directly; it must be inferred from fine-grained changes in object identity, action order, final state, hand interaction, and scene transition. We build a zero-shot reason-then-retrieve pipeline around Qwen3.5-27B.