Semantic Motion Anchors: Bridging Motion and Meaning in Co-Speech Gestures

ArXi:2605.30608v1 Announce Type: new Learning a shared representation between spoken text and gesture is central to co-speech gesture retrieval, synthesis, and understanding, but remains challenging for semantically meaningful gestures whose communicative intent is not captured by motion alone. Direct contrastive alignment between transcripts and continuous motion embeddings often overemphasizes low-level kinematics and misses the symbolic content of semantic gestures.