MemorySample Tradeoffs for Linear Regression with Small Error
Abstract
We consider the problem of performing linear regression over a stream of $d$dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples $(a_1,b_1), (a_2,b_2)\ldots,$ with $a_i$ drawn independently from a $d$dimensional isotropic Gaussian, and where $b_i = \langle a_i, x\rangle + \eta_i,$ for a fixed $x \in \mathbb{R}^d$ with $\x\_2 = 1$ and with independent noise $\eta_i$ drawn uniformly from the interval $[2^{d/5},2^{d/5}].$ We show that any algorithm with at most $d^2/4$ bits of memory requires at least $\Omega(d \log \log \frac{1}{\epsilon})$ samples to approximate $x$ to $\ell_2$ error $\epsilon$ with probability of success at least $2/3$, for $\epsilon$ sufficiently small as a function of $d$. In contrast, for such $\epsilon$, $x$ can be recovered to error $\epsilon$ with probability $1o(1)$ with memory $O\left(d^2 \log(1/\epsilon)\right)$ using $d$ examples. This represents the first nontrivial lower bounds for regression with superlinear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization.
 Publication:

arXiv eprints
 Pub Date:
 April 2019
 arXiv:
 arXiv:1904.08544
 Bibcode:
 2019arXiv190408544S
 Keywords:

 Computer Science  Machine Learning;
 Statistics  Machine Learning
 EPrint:
 22 pages, to appear in STOC'19