SelfRegularity of NonNegative Output Weights for Overparameterized TwoLayer Neural Networks
Abstract
We consider the problem of finding a twolayer neural network with sigmoid, rectified linear unit (ReLU), or binary step activation functions that "fits" a training data set as accurately as possible as quantified by the training error; and study the following question: \emph{does a low training error guarantee that the norm of the output layer (outer norm) itself is small?} We answer affirmatively this question for the case of nonnegative output weights. Using a simple covering number argument, we establish that under quite mild distributional assumptions on the input/label pairs; any such network achieving a small training error on polynomially many data necessarily has a wellcontrolled outer norm. Notably, our results (a) have a polynomial (in $d$) sample complexity, (b) are independent of the number of hidden units (which can potentially be very high), (c) are oblivious to the training algorithm; and (d) require quite mild assumptions on the data (in particular the input vector $X\in\mathbb{R}^d$ need not have independent coordinates). We then leverage our bounds to establish generalization guarantees for such networks through \emph{fatshattering dimension}, a scalesensitive measure of the complexity class that the network architectures we investigate belong to. Notably, our generalization bounds also have good sample complexity (polynomials in $d$ with a low degree), and are in fact nearlinear for some important cases of interest.
 Publication:

IEEE Transactions on Signal Processing
 Pub Date:
 2022
 DOI:
 10.1109/TSP.2022.3156702
 arXiv:
 arXiv:2103.01887
 Bibcode:
 2022ITSP...70.1310G
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning;
 Mathematics  Probability;
 Mathematics  Statistics Theory
 EPrint:
 34 pages. Some of the results in the present paper are significantly strengthened versions of certain results appearing in arXiv:2003.10523