Coordinate Descent Converges Faster with the GaussSouthwell Rule Than Random Selection
Abstract
There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a randomcoordinate selection rule achieves the same convergence rate as the GaussSouthwell selection rule. This result suggests that we should never use the GaussSouthwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the GaussSouthwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the GaussSouthwell rule showing thatexcept in extreme casesits convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a GaussSouthwellLipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate GaussSouthwell rules, and (iv) analyze proximalgradient variants of the GaussSouthwell rule.
 Publication:

arXiv eprints
 Pub Date:
 June 2015
 arXiv:
 arXiv:1506.00552
 Bibcode:
 2015arXiv150600552N
 Keywords:

 Mathematics  Optimization and Control;
 Computer Science  Machine Learning;
 Statistics  Computation;
 Statistics  Machine Learning
 EPrint:
 ICML 2015. v2: Updated the GaussSouthwellq result in Section 8 and Appendix H, to remove the part depending on mu_1 (the proof had an error). Added Section 8.1, which discusses conditions under which a rate depending on mu_1 does hold