Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models

ArXi:2606.04922v1 Announce Type: cross Current prompt-based and adapter-based tuning of vision-language models (VLMs) is attractive for medical imaging, where clinical data sensitivity favors frozen backbones and annotations are limited. However, these methods typically optimize only the ground-truth class, treating all other classes as equally incorrect, ignoring clinically meaningful class relations and yielding unstable decision boundaries in limited-supervision settings.