BayesOptimal Entropy Pursuit for Active ChoiceBased Preference Learning
Abstract
We analyze the problem of learning a single user's preferences in an active learning setting, sequentially and adaptively querying the user over a finite time horizon. Learning is conducted via choicebased queries, where the user selects her preferred option among a small subset of offered alternatives. These queries have been shown to be a robust and efficient way to learn an individual's preferences. We take a parametric approach and model the user's preferences through a linear classifier, using a Bayesian prior to encode our current knowledge of this classifier. The rate at which we learn depends on the alternatives offered at every time epoch. Under certain noise assumptions, we show that the Bayesoptimal policy for maximally reducing entropy of the posterior distribution of this linear classifier is a greedy policy, and that this policy achieves a linear lower bound when alternatives can be constructed from the continuum. Further, we analyze a different metric called misclassification error, proving that the performance of the optimal policy that minimizes misclassification error is bounded below by a linear function of differential entropy. Lastly, we numerically compare the greedy entropy reduction policy with a knowledge gradient policy under a number of scenarios, examining their performance under both differential entropy and misclassification error.
 Publication:

arXiv eprints
 Pub Date:
 February 2017
 arXiv:
 arXiv:1702.07694
 Bibcode:
 2017arXiv170207694P
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Information Theory;
 Computer Science  Machine Learning