Smooth minimization of nonsmooth functions with parallel coordinate descent methods

doi:10.48550/arXiv.1309.5885

Smooth minimization of nonsmooth functions with parallel coordinate descent methods

We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions. The problem class includes as a special case L1-regularized L1 regression and the minimization of the exponential loss ("AdaBoost problem"). We assume the input data defining the loss function is contained in a sparse $m\times n$ matrix $A$ with at most $\omega$ nonzeros in each row. Our methods need $O(n \beta/\tau)$ iterations to find an approximate solution with high probability, where $\tau$ is the number of processors and $\beta = 1 + (\omega-1)(\tau-1)/(n-1)$ for the fastest variant. The notation hides dependence on quantities such as the required accuracy and confidence levels and the distance of the starting iterate from an optimal point. Since $\beta/\tau$ is a decreasing function of $\tau$, the method needs fewer iterations when more processors are used. Certain variants of our algorithms perform on average only $O(\nnz(A)/n)$ arithmetic operations during a single iteration per processor and, because $\beta$ decreases when $\omega$ does, fewer iterations are needed for sparser problems.

Publication:

arXiv e-prints

Pub Date:

September 2013

DOI:

10.48550/arXiv.1309.5885

arXiv:

arXiv:1309.5885

Bibcode:

2013arXiv1309.5885F

Keywords:

Computer Science - Distributed;
Parallel;
and Cluster Computing;
Mathematics - Optimization and Control;
Statistics - Machine Learning

E-Print:

39 pages, 1 algorithm, 3 figures, 2 tables

NASA/ADS

Smooth minimization of nonsmooth functions with parallel coordinate descent methods

Abstract