AI RESEARCH

What Does the Caption Really Say? Counterfactual Phrase Intervention for Compositional Data Selection in Vision-Language Pretraining

arXiv CS.CV

ArXi:2605.22651v1 Announce Type: new CLIP-style contrastive pre