MedSAE: Dissecting MedCLIP Representations with Sparse Autoencoders

ArXi:2510.26411v2 Announce Type: replace Artificial intelligence in healthcare requires models that are accurate and interpretable. We advance mechanistic interpretability in medical vision by applying Medical Sparse Autoencoders (MedSAEs) to the latent space of MedCLIP, a vision-language model trained on chest radiographs and reports. To quantify interpretability, we propose an evaluation framework that combines correlation metrics, entropy analyses, and automated neuron naming via the MedGemma foundation model.