Learning Deep ReLU Networks Is FixedParameter Tractable
Abstract
We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. Our bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network (we show that some dependence on the Lipschitz constant is necessary). We also give a bound that is doubly exponential in the size of the network but is independent of spectral norm. These results provably cannot be obtained using gradientbased methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of depth three or higher requires exponential time in the ambient dimension, even when the above parameters are bounded by a constant. Additionally, all prior work for the depthtwo case requires wellconditioned weights and/or positive coefficients to obtain efficient runtimes. Our algorithm does not require these assumptions. Our main technical tool is a type of filtered PCA that can be used to iteratively recover an approximate basis for the subspace spanned by the hidden units in the first layer. Our analysis leverages new structural results on lattice polynomials from tropical geometry.
 Publication:

arXiv eprints
 Pub Date:
 September 2020
 arXiv:
 arXiv:2009.13512
 Bibcode:
 2020arXiv200913512C
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Data Structures and Algorithms;
 Statistics  Machine Learning
 EPrint:
 39 pages