AI RESEARCH
To Grok Grokking: Provable Grokking in Ridge Regression
arXiv CS.LG
•
ArXi:2601.19791v3 Announce Type: replace We study grokking, the onset of generalization long after overfitting, in a classical ridge regression setting. We prove end-to-end grokking results for learning over-parameterized linear regression models using gradient descent with weight decay. Specifically, we prove that the following stages occur: (i) the model overfits the