Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

doi:10.48550/arXiv.1711.07354

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

By lifting the ReLU function into a higher dimensional space, we develop a smooth multi-convex formulation for training feed-forward deep neural networks (DNNs). This allows us to develop a block coordinate descent (BCD) training algorithm consisting of a sequence of numerically well-behaved convex optimizations. Using ideas from proximal point methods in convex analysis, we prove that this BCD algorithm will converge globally to a stationary point with R-linear convergence rate of order one. In experiments with the MNIST database, DNNs trained with this BCD algorithm consistently yielded better test-set error rates than identical DNN architectures trained via all the stochastic gradient descent (SGD) variants in the Caffe toolbox.

Publication:

arXiv e-prints

Pub Date:

November 2017

DOI:

10.48550/arXiv.1711.07354

arXiv:

arXiv:1711.07354

Bibcode:

2017arXiv171107354Z

Keywords:

Statistics - Machine Learning;
Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Machine Learning

E-Print:

NIPS 2017

ADS

Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks

Abstract