Scalable and Accurate Variational Bayes for HighDimensional Binary Regression Models
Abstract
Modern methods for Bayesian regression beyond the Gaussian response setting are often computationally impractical or inaccurate in high dimensions. In fact, as discussed in recent literature, bypassing such a tradeoff is still an open problem even in routine binary regression models, and there is limited theory on the quality of variational approximations in highdimensional settings. To address this gap, we study the approximation accuracy of routinelyused meanfield variational Bayes solutions in highdimensional probit regression with Gaussian priors, obtaining novel and practically relevant results on the pathological behavior of such strategies in uncertainty quantification, point estimation and prediction. Motivated by these results, we further develop a new partiallyfactorized variational approximation for the posterior of the probit coefficients which leverages a representation with global and local variables but, unlike for classical meanfield assumptions, it avoids a fully factorized approximation, and instead assumes a factorization only for the local variables. We prove that the resulting approximation belongs to a tractable class of unified skewnormal densities that crucially incorporates skewness and, unlike for stateoftheart meanfield solutions, converges to the exact posterior density as p goes to infinity. To solve the variational optimization problem, we derive a tractable CAVI algorithm that easily scales to p in the tens of thousands, and provably requires a number of iterations converging to 1 as p goes to infinity. Such findings are also illustrated in extensive empirical studies where our novel solution is shown to improve the approximation accuracy of meanfield variational Bayes for any n and p, with the magnitude of these gains being remarkable in those highdimensional p>n settings where stateoftheart methods are computationally impractical.
 Publication:

arXiv eprints
 Pub Date:
 November 2019
 DOI:
 10.48550/arXiv.1911.06743
 arXiv:
 arXiv:1911.06743
 Bibcode:
 2019arXiv191106743F
 Keywords:

 Statistics  Methodology;
 Statistics  Computation