Control Variate-based Stochastic Sampling from the Probability Simplex

doi:10.48550/arXiv.2410.00845

Control Variate-based Stochastic Sampling from the Probability Simplex

This paper presents a control variate-based Markov chain Monte Carlo algorithm for efficient sampling from the probability simplex, with a focus on applications in large-scale Bayesian models such as latent Dirichlet allocation. Standard Markov chain Monte Carlo methods, particularly those based on Langevin diffusions, suffer from significant discretization errors near the boundaries of the simplex, which are exacerbated in sparse data settings. To address this issue, we propose an improved approach based on the stochastic Cox--Ingersoll--Ross process, which eliminates discretization errors and enables exact transition densities. Our key contribution is the integration of control variates, which significantly reduces the variance of the stochastic gradient estimator in the Cox--Ingersoll--Ross process, thereby enhancing the accuracy and computational efficiency of the algorithm. We provide a theoretical analysis showing the variance reduction achieved by the control variates approach and demonstrate the practical advantages of our method in data subsampling settings. Empirical results on large datasets show that the proposed method outperforms existing approaches in both accuracy and scalability.

Publication:

arXiv e-prints

Pub Date:

October 2024

DOI:

10.48550/arXiv.2410.00845

arXiv:

arXiv:2410.00845

Bibcode:

2024arXiv241000845B

Keywords:

Statistics - Methodology

NASA/ADS

Control Variate-based Stochastic Sampling from the Probability Simplex

Abstract