Geometry of the Loss Landscape in Overparameterized Neural Networks: Symmetries and Invariances
Abstract
We study how permutation symmetries in overparameterized multilayer neural networks generate `symmetryinduced' critical points. Assuming a network with $ L $ layers of minimal widths $ r_1^*, \ldots, r_{L1}^* $ reaches a zeroloss minimum at $ r_1^*! \cdots r_{L1}^*! $ isolated points that are permutations of one another, we show that adding one extra neuron to each layer is sufficient to connect all these previously discrete minima into a single manifold. For a twolayer overparameterized network of width $ r^*+ h =: m $ we explicitly describe the manifold of global minima: it consists of $ T(r^*, m) $ affine subspaces of dimension at least $ h $ that are connected to one another. For a network of width $m$, we identify the number $G(r,m)$ of affine subspaces containing only symmetryinduced critical points that are related to the critical points of a smaller network of width $r<r^*$. Via a combinatorial analysis, we derive closedform formulas for $ T $ and $ G $ and show that the number of symmetryinduced critical subspaces dominates the number of affine subspaces forming the global minima manifold in the mildly overparameterized regime (small $ h $) and vice versa in the vastly overparameterized regime ($h \gg r^*$). Our results provide new insights into the minimization of the nonconvex loss function of overparameterized neural networks.
 Publication:

arXiv eprints
 Pub Date:
 May 2021
 arXiv:
 arXiv:2105.12221
 Bibcode:
 2021arXiv210512221S
 Keywords:

 Computer Science  Machine Learning
 EPrint:
 29 pages, 12 figures, ICML 2021