The ZigZag Process and SuperEfficient Sampling for Bayesian Analysis of Big Data
Abstract
Standard MCMC methods can scale poorly to big data settings due to the need to evaluate the likelihood at each iteration. There have been a number of approximate MCMC algorithms that use subsampling ideas to reduce this computational burden, but with the drawback that these algorithms no longer target the true posterior distribution. We introduce a new family of Monte Carlo methods based upon a multidimensional version of the ZigZag process of (Bierkens, Roberts, 2017), a continuous time piecewise deterministic Markov process. While traditional MCMC methods are reversible by construction (a property which is known to inhibit rapid convergence) the ZigZag process offers a flexible nonreversible alternative which we observe to often have favourable convergence properties. We show how the ZigZag process can be simulated without discretisation error, and give conditions for the process to be ergodic. Most importantly, we introduce a subsampling version of the ZigZag process that is an example of an {\em exact approximate scheme}, i.e. the resulting approximate process still has the posterior as its stationary distribution. Furthermore, if we use a controlvariate idea to reduce the variance of our unbiased estimator, then the ZigZag process can be superefficient: after an initial preprocessing step, essentially independent samples from the posterior distribution are obtained at a computational cost which does not depend on the size of the data.
 Publication:

arXiv eprints
 Pub Date:
 July 2016
 arXiv:
 arXiv:1607.03188
 Bibcode:
 2016arXiv160703188B
 Keywords:

 Statistics  Computation;
 Mathematics  Probability;
 65C60;
 65C05;
 62F15;
 60J25
 EPrint:
 Ann. Statist., Volume 47, Number 3 (2019), 12881320