AI RESEARCH

Density-Aware Translation of Spurious Correlations in Zero-Shot VLMs

arXiv CS.LG

ArXi:2606.01710v1 Announce Type: cross Vision-Language models (VLMs), such as CLIP, achieve powerful zero-shot classification. However, their predictions remain sensitive to spurious correlations, where contextual cues dominate over semantic content. Earlier solutions typically rely on fine-tuning or prompt engineering, which either undermine the advantages of pre-trained models or are prone to hallucination. In this work, we propose Density-Aware Translation (DAT) that refines image-text similarity scores using a local geometric density term derived from group reference sets.