Exploring the Capabilities of Large Language Model Encoders for Image-Text Retrieval in Chest X-rays

ArXi:2509.15234v2 Announce Type: replace Multimodal learning from paired medical images and clinical text is a central challenge in medical data-driven informatics, where effective cross-modal alignment is critical for scalable analysis and retrieval. In chest radiography, vision-language pre