Maximum entropy models and subjective interestingness: an application to tiles in binary databases
Abstract
Recent research has highlighted the practical benefits of subjective interestingness measures, which quantify the novelty or unexpectedness of a pattern when contrasted with any prior information of the data miner (Silberschatz and Tuzhilin, 1995; Geng and Hamilton, 2006). A key challenge here is the formalization of this prior information in a way that lends itself to the definition of an interestingness subjective measure that is both meaningful and practical. In this paper, we outline a general strategy of how this could be achieved, before working out the details for a use case that is important in its own right. Our general strategy is based on considering prior information as constraints on a probabilistic model representing the uncertainty about the data. More specifically, we represent the prior information by the maximum entropy (MaxEnt) distribution subject to these constraints. We briefly outline various measures that could subsequently be used to contrast patterns with this MaxEnt model, thus quantifying their subjective interestingness.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2010
- DOI:
- 10.48550/arXiv.1008.3314
- arXiv:
- arXiv:1008.3314
- Bibcode:
- 2010arXiv1008.3314D
- Keywords:
-
- Computer Science - Artificial Intelligence
- E-Print:
- 43 pages, submitted