Repeated Sequences Reveal Gaps between Large Language Models and Natural Language

ArXi:2605.24850v1 Announce Type: new Evaluating whether large language models (LLMs) capture the structure of natural language beyond local fluency remains an open challenge. Existing evaluation methods, largely based on task performance or short-context behavior, provide limited insight into the long-range statistical organization of generated text. We propose a complementary evaluation framework based on repeated subsequences.