Universal Boosts, Specific Suppressors: Sparse Autoencoder Steering of Medical Vision-Language Models

ArXi:2605.24977v1 Announce Type: cross Medical vision-language models (VLMs) often hallucinate findings when generating chest X-ray reports: they fabricate findings that are not present in the image, miss important ones, or locate them incorrectly. We mitigate this without weight updates by decoding-time residual steering on a per-token sparse autoencoder (SAE) basis: Top-$K$ SAEs on late layers, causal steering against clinical errors, then combined suppress/boost intervention at inference time.