AI RESEARCH

SemStruct: Contextualizing Semantic Embeddings with Structural Information for Schema Matching

arXiv CS.LG

ArXi:2605.30729v1 Announce Type: new Schema matching is a fundamental step in integrating heterogeneous data sources. While Pre-trained Language Models (PLMs) have revolutionized this task by capturing linguistic semantics, they typically process tabular data as serialized text sequences of standalone column descriptions. This serialization discards critical structural information -- specifically, the row-level co-occurrences, i.e. the relational context -- forcing models to rely solely on column header semantics or standalone distributions.