TextTeacher: What Can Language Teach About Images?

ArXi:2605.22098v1 Announce Type: new The platonic representation hypothesis suggests that sufficiently large models converge to a shared representation geometry, even across modalities. Motivated by this, we ask: Can the semantic knowledge of a language model efficiently improve a vision model? As an answer, we Project page with code and captions