AI RESEARCH
Discovering Lexical Gaps Using Embeddings from Multilingual LLMs
arXiv CS.LG
•
ArXi:2605.24310v1 Announce Type: cross Lexical gaps are words that do not exist in certain languages. They pose challenges for building multilingual lexical resources, for machine translation, and for cross-lingual transfer. Existing lexical gap detection relies on human judgments or fixed conceptual taxonomies. We propose a data-driven framework for identifying cross-lingual lexical gaps. We extracted contextualized embeddings from Korean-English bilingual LLMs for Korean-to-English and English-to-Korean translation pairs.