Discriminative Bayesian filtering lends momentum to the stochastic Newton method for minimizing log-convex functions
Abstract
To minimize the average of a set of log-convex functions, the stochastic Newton method iteratively updates its estimate using subsampled versions of the full objective's gradient and Hessian. We contextualize this optimization problem as sequential Bayesian inference on a latent state-space model with a discriminatively-specified observation process. Applying Bayesian filtering then yields a novel optimization algorithm that considers the entire history of gradients and Hessians when forming an update. We establish matrix-based conditions under which the effect of older observations diminishes over time, in a manner analogous to Polyak's heavy ball momentum. We illustrate various aspects of our approach with an example and review other relevant innovations for the stochastic Newton method.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2021
- DOI:
- 10.48550/arXiv.2104.12949
- arXiv:
- arXiv:2104.12949
- Bibcode:
- 2021arXiv210412949B
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning;
- Mathematics - Optimization and Control;
- 49M15;
- 90C15;
- 62M20 (Primary);
- 90C25 (Secondary)
- E-Print:
- to appear in: Optimization Letters (2022)