Fundamental Limits of RidgeRegularized Empirical Risk Minimization in High Dimensions
Abstract
Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signalprocessing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of convex ERM for inference in highdimensional generalized linear models. For a stylized setting with Gaussian features and problem dimensions that grow large at a proportional rate, we start with sharp performance characterizations and then derive tight lower bounds on the estimation and prediction error that hold over a wide class of loss functions and for any value of the regularization parameter. Our precise analysis has several attributes. First, it leads to a recipe for optimally tuning the loss function and the regularization parameter. Second, it allows to precisely quantify the suboptimality of popular heuristic choices: for instance, we show that optimallytuned leastsquares is (perhaps surprisingly) approximately optimal for standard logistic data, but the suboptimality gap grows drastically as the signal strength increases. Third, we use the bounds to precisely assess the merits of ridgeregularization as a function of the overparameterization ratio. Notably, our bounds are expressed in terms of the Fisher Information of random variables that are simple functions of the data distribution, thus making ties to corresponding bounds in classical statistics.
 Publication:

arXiv eprints
 Pub Date:
 June 2020
 arXiv:
 arXiv:2006.08917
 Bibcode:
 2020arXiv200608917T
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Information Theory;
 Computer Science  Machine Learning;
 Electrical Engineering and Systems Science  Signal Processing