AI RESEARCH
When Both Layers Learn: Training Dynamics of Representing Linear Models via ReLU Networks
arXiv CS.LG
•
ArXi:2606.04476v1 Announce Type: new In this paper, we study the gradient descent dynamics for jointly