Computing Exact Clustering Posteriors with Subset Convolution
Abstract
An exponential-time exact algorithm is provided for the task of clustering n items of data into k clusters. Instead of seeking one partition, posterior probabilities are computed for summary statistics: the number of clusters, and pairwise co-occurrence. The method is based on subset convolution, and yields the posterior distribution for the number of clusters in O(n * 3^n) operations, or O(n^3 * 2^n) using fast subset convolution. Pairwise co-occurrence probabilities are then obtained in O(n^3 * 2^n) operations. This is considerably faster than exhaustive enumeration of all partitions.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2013
- DOI:
- arXiv:
- arXiv:1310.1034
- Bibcode:
- 2013arXiv1310.1034K
- Keywords:
-
- Statistics - Computation;
- Statistics - Methodology
- E-Print:
- 6 figures