Cross-Modal Attention Calibration for LVLM Hallucination Mitigation

ArXi:2501.01926v3 Announce Type: replace-cross Large vision-language models (LVLMs) have shown remarkable capabilities in visual-language understanding. Despite their success, LVLMs still suffer from generating hallucinations in complex generation tasks, leading to inconsistencies between visual inputs and generated content. To address this issue, some approaches have