How Far Are We from Generating Missing Modalities with Foundation Models?

ArXi:2506.03530v3 Announce Type: replace-cross Multimodal foundation models have nstrated impressive capabilities across diverse tasks. However, their potential as plug-and-play solutions for missing modality reconstruction remains underexplored. To bridge this gap, we identify and formalize three potential paradigms for missing modality reconstruction, and perform a comprehensive evaluation across these paradigms, covering 42 model variants in terms of reconstruction accuracy and adaptability to downstream tasks.