AI RESEARCH

Conceptual Schema Inference for Tabular Datasets using Large Language Models

arXiv CS.AI

ArXi:2509.04632v2 Announce Type: replace-cross Large collections of tabular data from data lakes, web tables and open data portals often originate from heterogeneous sources, leading to representational inconsistencies. Understanding and organizing such repositories therefore remains a major challenge. While prior work has primarily focused on dataset discovery and exploration, this paper addresses the complementary problem of conceptual schema inference: automatically deriving a conceptual schema that captures entity types, attributes and inter-type relationships directly from raw tables.