CodecCap: High-Fidelity Codec-Inspired Residual Modeling for Dense Video Captioning

ArXi:2605.26967v1 Announce Type: new Existing video captioning methods struggle to balance visual fidelity and redundancy: holistic captions are compact but lose fine-grained evidence, whereas segment-wise captions improve coverage but