Scaling analysis of affinity propagation
Abstract
We analyze and exploit some scaling properties of the affinity propagation (AP) clustering algorithm proposed by Frey and Dueck [Science 315, 972 (2007)]. Following a divide and conquer strategy we setup an exact renormalization-based approach to address the question of clustering consistency, in particular, how many cluster are present in a given data set. We first observe that the divide and conquer strategy, used on a large data set hierarchically reduces the complexity O(N2) to O(N(h+2)/(h+1)) , for a data set of size N and a depth h of the hierarchical strategy. For a data set embedded in a d -dimensional space, we show that this is obtained without notably damaging the precision except in dimension d=2 . In fact, for d larger than 2 the relative loss in precision scales such as N(2-d)/(h+1)d . Finally, under some conditions we observe that there is a value s∗ of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for s<s∗ ) from a coalescent one (for s>s∗ ) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position, as a result of an exact decimation procedure. From this observation, a strategy based on AP can be defined to find out how many clusters are present in a given data set.
- Publication:
-
Physical Review E
- Pub Date:
- June 2010
- DOI:
- 10.1103/PhysRevE.81.066102
- arXiv:
- arXiv:0910.1800
- Bibcode:
- 2010PhRvE..81f6102F
- Keywords:
-
- 89.75.Da;
- Systems obeying scaling laws;
- Computer Science - Artificial Intelligence;
- Condensed Matter - Statistical Mechanics
- E-Print:
- 28 pages, 14 figures, Inria research report