Focusing Where Vision Matters: Selective Training for Large Vision Language Models via Visual Information Gain

ArXi:2602.17186v2 Announce Type: replace Large Vision Language Models (LVLMs) have achieved remarkable progress, yet they often suffer from language bias, producing answers without relying on visual evidence. While prior work attempts to mitigate this issue through decoding strategies, architectural modifications, or curated instruction data, they typically lack a quantitative measure of how much individual