Lifted Proximal Operator Machines

doi:10.48550/arXiv.1811.01501

Lifted Proximal Operator Machines

We propose a new optimization method for training feed-forward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feed-forward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multi-convex in all layer-wise weights and activations. This allows us to use block coordinate descent to update the layer-wise weights and activations in parallel. Most notably, we only use the mapping of the activation function itself, rather than its derivatives, thus avoiding the gradient vanishing or blow-up issues in gradient based training methods. So our method is applicable to various non-decreasing Lipschitz continuous activation functions, which can be saturating and non-differentiable. LPOM does not require more auxiliary variables than the layer-wise activations, thus using roughly the same amount of memory as stochastic gradient descent (SGD) does. We further prove the convergence of updating the layer-wise weights and activations. Experiments on MNIST and CIFAR-10 datasets testify to the advantages of LPOM.

Publication:

arXiv e-prints

Pub Date:

November 2018

DOI:

10.48550/arXiv.1811.01501

arXiv:

arXiv:1811.01501

Bibcode:

2018arXiv181101501L

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Mathematics - Optimization and Control;
Statistics - Machine Learning

E-Print:

Accepted by AAAI 2019

NASA/ADS

Lifted Proximal Operator Machines

Abstract