Variance reduction for Random Coordinate DescentLangevin Monte Carlo
Abstract
Sampling from a logconcave distribution function is one core problem that has wide applications in Bayesian statistics and machine learning. While most gradient free methods have slow convergence rate, the Langevin Monte Carlo (LMC) that provides fast convergence requires the computation of gradients. In practice one uses finitedifferencing approximations as surrogates, and the method is expensive in highdimensions. A natural strategy to reduce computational cost in each iteration is to utilize random gradient approximations, such as random coordinate descent (RCD) or simultaneous perturbation stochastic approximation (SPSA). We show by a counterexample that blindly applying RCD does not achieve the goal in the most general setting. The high variance induced by the randomness means a larger number of iterations are needed, and this balances out the saving in each iteration. We then introduce a new variance reduction approach, termed Randomized Coordinates Averaging Descent (RCAD), and incorporate it with both overdamped and underdamped LMC. The methods are termed RCADOLMC and RCADULMC respectively. The methods still sit in the random gradient approximation framework, and thus the computational cost in each iteration is low. However, by employing RCAD, the variance is reduced, so the methods converge within the same number of iterations as the classical overdamped and underdamped LMC. This leads to a computational saving overall.
 Publication:

arXiv eprints
 Pub Date:
 June 2020
 arXiv:
 arXiv:2006.06068
 Bibcode:
 2020arXiv200606068D
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning;
 Mathematics  Statistics Theory