Mitigating Hallucination in Vision-Language Models through Barrier-Regulated Adaptive Closed-form Steering

ArXi:2605.29881v1 Announce Type: cross Large vision-language models (LVLMs) often hallucinate objects that are not present in the input image, largely because visual grounding weakens as decoding progresses. Existing inference-time mitigation methods modify logits or hidden states throughout generation, but they suffer from three key limitations: they lack an explicit grounding objective, intervene even when the model is already well-grounded, and use fixed correction strengths that do not adapt to the severity of grounding failure.