Accelerated Dual Learning by Homotopic Initialization
Abstract
Gradient descent and coordinate descent are well understood in terms of their asymptotic behavior, but less so in a transient regime often used for approximations in machine learning. We investigate how proper initialization can have a profound effect on finding near-optimal solutions quickly. We show that a certain property of a data set, namely the boundedness of the correlations between eigenfeatures and the response variable, can lead to faster initial progress than expected by commonplace analysis. Convex optimization problems can tacitly benefit from that, but this automatism does not apply to their dual formulation. We analyze this phenomenon and devise provably good initialization strategies for dual optimization as well as heuristics for the non-convex case, relevant for deep learning. We find our predictions and methods to be experimentally well-supported.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2017
- DOI:
- 10.48550/arXiv.1706.03958
- arXiv:
- arXiv:1706.03958
- Bibcode:
- 2017arXiv170603958D
- Keywords:
-
- Computer Science - Machine Learning