Flatness and Generalization: Learning Multi-Index Models with Homogeneous Neural Networks

ArXi:2606.04429v1 Announce Type: cross A common heuristic used to explain the generalization of first-order gradient methods on non-convex neural networks is that "flat interpolators generalize well" (Hochreiter and Schmidhuber, 1994; Keskar, 2017), where flatness can be measured by the trace of the Hessian of the empirical loss. However, Dinh 2017) showed that, using symmetry of the network that can change flatness while keeping the population and empirical losses unchanged, any interpolator can be made sharper or flatter. This result makes the earlier heuristic statement vacuous.