AI RESEARCH

A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization

arXiv CS.LG

ArXi:2606.00230v1 Announce Type: new Grokking, the phenomenon in which neural networks generalize long after fitting their