Computing Exact Clustering Posteriors with Subset Convolution

doi:10.48550/arXiv.1310.1034

Computing Exact Clustering Posteriors with Subset Convolution

An exponential-time exact algorithm is provided for the task of clustering n items of data into k clusters. Instead of seeking one partition, posterior probabilities are computed for summary statistics: the number of clusters, and pairwise co-occurrence. The method is based on subset convolution, and yields the posterior distribution for the number of clusters in O(n * 3^n) operations, or O(n^3 * 2^n) using fast subset convolution. Pairwise co-occurrence probabilities are then obtained in O(n^3 * 2^n) operations. This is considerably faster than exhaustive enumeration of all partitions.

Publication:

arXiv e-prints

Pub Date:

October 2013

DOI:

10.48550/arXiv.1310.1034

arXiv:

arXiv:1310.1034

Bibcode:

2013arXiv1310.1034K

Keywords:

Statistics - Computation;
Statistics - Methodology

E-Print:

6 figures

ADS

Computing Exact Clustering Posteriors with Subset Convolution

Abstract