Information Newton's flow: second-order optimization method in probability space
Abstract
We introduce a framework for Newton's flows in probability space with information metrics, named information Newton's flows. Here two information metrics are considered, including both the Fisher-Rao metric and the Wasserstein-2 metric. A known fact is that overdamped Langevin dynamics correspond to Wasserstein gradient flows of Kullback-Leibler (KL) divergence. Extending this fact to Wasserstein Newton's flows, we derive Newton's Langevin dynamics. We provide examples of Newton's Langevin dynamics in both one-dimensional space and Gaussian families. For the numerical implementation, we design sampling efficient variational methods in affine models and reproducing kernel Hilbert space (RKHS) to approximate Wasserstein Newton's directions. We also establish convergence results of the proposed information Newton's method with approximated directions. Several numerical examples from Bayesian sampling problems are shown to demonstrate the effectiveness of the proposed method.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2020
- DOI:
- 10.48550/arXiv.2001.04341
- arXiv:
- arXiv:2001.04341
- Bibcode:
- 2020arXiv200104341W
- Keywords:
-
- Mathematics - Optimization and Control;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 62 pages