AI RESEARCH

To Grok Grokking: Provable Grokking in Ridge Regression

arXiv CS.LG

ArXi:2601.19791v3 Announce Type: replace We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the