AI RESEARCH

VCIFBench: Evaluating Complex Instruction Following for Video Understanding

arXiv CS.CL

ArXi:2606.04588v1 Announce Type: new Multimodal large language models have made rapid progress in video understanding, yet existing benchmarks largely rely on simple prompts and provide limited evidence about whether models can satisfy explicit output constraints. We