Variance Reduction via Accelerated Dual Averaging for FiniteSum Optimization
Abstract
In this paper, we introduce a simplified and unified method for finitesum convex optimization, named \emph{Variance Reduction via Accelerated Dual Averaging (VRADA)}. In both general convex and strongly convex settings, VRADA can attain an $O\big(\frac{1}{n}\big)$accurate solution in $O(n\log\log n)$ number of stochastic gradient evaluations which improves the bestknown result $O(n\log n)$, where $n$ is the number of samples. Meanwhile, VRADA matches the lower bound of the general convex setting up to a $\log\log n$ factor and matches the lower bounds in both regimes $n\le \Theta(\kappa)$ and $n\gg \kappa$ of the strongly convex setting, where $\kappa$ denotes the condition number. Besides improving the bestknown results and matching all the above lower bounds simultaneously, VRADA has more unified and simplified algorithmic implementation and convergence analysis for both the general convex and strongly convex settings. The underlying novel approaches such as the novel initialization strategy in VRADA may be of independent interest. Through experiments on real datasets, we show the good performance of VRADA over existing methods for largescale machine learning problems.
 Publication:

arXiv eprints
 Pub Date:
 June 2020
 arXiv:
 arXiv:2006.10281
 Bibcode:
 2020arXiv200610281S
 Keywords:

 Mathematics  Optimization and Control;
 Computer Science  Machine Learning
 EPrint:
 19 pages, 12 figures