Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices

doi:10.48550/arXiv.1912.10754

Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices

Mourtada, Jaouad

We consider random-design linear prediction and related questions on the lower tail of random matrices. It is known that, under boundedness constraints, the minimax risk is of order $d/n$ in dimension $d$ with $n$ samples. Here, we study the minimax expected excess risk over the full linear class, depending on the distribution of covariates. First, the least squares estimator is exactly minimax optimal in the well-specified case, for every distribution of covariates. We express the minimax risk in terms of the distribution of statistical leverage scores of individual samples, and deduce a minimax lower bound of $d/(n-d+1)$ for any covariate distribution, nearly matching the risk for Gaussian design. We then obtain sharp nonasymptotic upper bounds for covariates that satisfy a "small ball"-type regularity condition in both well-specified and misspecified cases. Our main technical contribution is the study of the lower tail of the smallest singular value of empirical covariance matrices at small values. We establish a lower bound on this lower tail, valid for any distribution in dimension $d \geq 2$, together with a matching upper bound under a necessary regularity condition. Our proof relies on the PAC-Bayes technique for controlling empirical processes, and extends an analysis of Oliveira devoted to a different part of the lower tail.

Publication:

arXiv e-prints

Pub Date:

December 2019

DOI:

10.48550/arXiv.1912.10754

arXiv:

arXiv:1912.10754

Bibcode:

2019arXiv191210754M

Keywords:

Mathematics - Statistics Theory;
Mathematics - Probability;
Statistics - Machine Learning;
62J05;
60B20;
62C20

E-Print:

39 pages

NASA/ADS

Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices

Abstract