AI RESEARCH

Feature Learning in Wide Neural Networks under $\mu$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

arXiv CS.LG

ArXi:2605.24710v1 Announce Type: new We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($\mu$P). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under $\mu$P, identifying the maximal admissible weight $w^*$ on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation has uniform-in-time squared-Wasserstein rate $O(N^{-1.