AI RESEARCH
What Does the Caption Really Say? Counterfactual Phrase Intervention for Compositional Data Selection in Vision-Language Pretraining
arXiv CS.CV
•
ArXi:2605.22651v1 Announce Type: new CLIP-style contrastive pre