Deep networks learn to parse uniform-depth context-free languages from local statistics

ArXi:2602.06065v3 Announce Type: replace-cross Understanding how the structure of language can be learned from sentences alone is a central question in both cognitive science and machine learning. Studies of the internal representations of Large Language Models (LLMs) their ability to parse text when predicting the next word, while representing semantic notions independently of surface form. Yet, which data statistics make these feats possible, and how much data is required, remain largely unknown.