Compressed Gaussian Process
Abstract
Nonparametric regression for massive numbers of samples (n) and features (p) is an increasingly important problem. In big n settings, a common strategy is to partition the feature space, and then separately apply simple models to each partition set. We propose an alternative approach, which avoids such partitioning and the associated sensitivity to neighborhood choice and distance metrics, by using random compression combined with Gaussian process regression. The proposed approach is particularly motivated by the setting in which the response is conditionally independent of the features given the projection to a low dimensional manifold. Conditionally on the random compression matrix and a smoothness parameter, the posterior distribution for the regression surface and posterior predictive distributions are available analytically. Running the analysis in parallel for many random compression matrices and smoothness parameters, model averaging is used to combine the results. The algorithm can be implemented rapidly even in very big n and p problems, has strong theoretical justification, and is found to yield state of the art predictive performance.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2014
- DOI:
- 10.48550/arXiv.1406.1916
- arXiv:
- arXiv:1406.1916
- Bibcode:
- 2014arXiv1406.1916G
- Keywords:
-
- Statistics - Machine Learning
- E-Print:
- 33 pages, 8 figures