Training-Free Composed Video Retrieval via Visual Representation-Guided Video-LLM Reasoning

ArXi:2606.02321v1 Announce Type: new Recent advances in large vision-language models have expanded video retrieval from simple text-based search to flexible scenarios, where users may specify the desired result through both visual examples and textual instructions. In the CVPR 2026 Reason-Aware Composed Video Retrieval Challenge, the system is required to retrieve a target video according to a reference video and a modification instruction. To address this task, we develop Visual Representation-Guided Video-LLM Reasoning for.