Gradient Descent is ParetoOptimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems
Abstract
In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memoryconstrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$dimensional ball and contains a ball of known radius $\epsilon>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $\epsilon \geq e^{d^{o(1)}}$, any deterministic algorithm either uses $d^{1+\delta}$ bits of memory or must make at least $1/(d^{0.01\delta }\epsilon^{2\frac{1\delta}{1+1.01 \delta}o(1)})$ oracle queries, for any $\delta\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+\delta}$ memory or make at least $1/(d^{2\delta} \epsilon^{2(14\delta)o(1)})$ queries for any $\delta\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/\epsilon)$ but makes $\Omega(1/\epsilon^2)$ queries, our results imply that it is Paretooptimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/\epsilon$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2 \ln1/\epsilon)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/\epsilon)$ queries.
 Publication:

arXiv eprints
 Pub Date:
 April 2024
 DOI:
 10.48550/arXiv.2404.06720
 arXiv:
 arXiv:2404.06720
 Bibcode:
 2024arXiv240406720B
 Keywords:

 Mathematics  Optimization and Control;
 Computer Science  Computational Complexity;
 Computer Science  Data Structures and Algorithms;
 Computer Science  Machine Learning;
 Statistics  Machine Learning