Lifted Proximal Operator Machines
Abstract
We propose a new optimization method for training feed-forward neural networks. By rewriting the activation function as an equivalent proximal operator, we approximate a feed-forward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM). LPOM is block multi-convex in all layer-wise weights and activations. This allows us to use block coordinate descent to update the layer-wise weights and activations in parallel. Most notably, we only use the mapping of the activation function itself, rather than its derivatives, thus avoiding the gradient vanishing or blow-up issues in gradient based training methods. So our method is applicable to various non-decreasing Lipschitz continuous activation functions, which can be saturating and non-differentiable. LPOM does not require more auxiliary variables than the layer-wise activations, thus using roughly the same amount of memory as stochastic gradient descent (SGD) does. We further prove the convergence of updating the layer-wise weights and activations. Experiments on MNIST and CIFAR-10 datasets testify to the advantages of LPOM.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2018
- DOI:
- 10.48550/arXiv.1811.01501
- arXiv:
- arXiv:1811.01501
- Bibcode:
- 2018arXiv181101501L
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Mathematics - Optimization and Control;
- Statistics - Machine Learning
- E-Print:
- Accepted by AAAI 2019