AI RESEARCH
A Pre-Training Analogue of Grokking in Language Models: Tracing Delayed Grammatical Generalization
arXiv CS.LG
•
ArXi:2606.00230v1 Announce Type: new Grokking, the phenomenon in which neural networks generalize long after fitting their