Submodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields

doi:10.48550/arXiv.1209.3694

Submodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields

Many real-world datasets can be represented in the form of a graph whose edge weights designate similarities between instances. A discrete Gaussian random field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior covariance is the inverse of a graph Laplacian. Minimizing the trace of the predictive covariance Sigma (V-optimality) on GRFs has proven successful in batch active learning classification problems with budget constraints. However, its worst-case bound has been missing. We show that the V-optimality on GRFs as a function of the batch query set is submodular and hence its greedy selection algorithm guarantees an (1-1/e) approximation ratio. Moreover, GRF models have the absence-of-suppressor (AofS) condition. For active survey problems, we propose a similar survey criterion which minimizes 1'(Sigma)1. In practice, V-optimality criterion performs better than GPs with mutual information gain criteria and allows nonuniform costs for different nodes.

Publication:

arXiv e-prints

Pub Date:

September 2012

DOI:

10.48550/arXiv.1209.3694

arXiv:

arXiv:1209.3694

Bibcode:

2012arXiv1209.3694M

Keywords:

Computer Science - Machine Learning;
Computer Science - Artificial Intelligence;
Computer Science - Data Structures and Algorithms

NASA/ADS

Submodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields

Abstract