Datasets as Interacting Particle Systems: a Framework for Clustering
Abstract
In this paper we propose a framework inspired by interacting particle physics and devised to perform clustering on multidimensional datasets. To this end, any given dataset is modeled as an interacting particle system, under the assumption that each element of the dataset corresponds to a different particle and that particle interactions are rendered through gaussian potentials. Moreover, the way particle interactions are evaluated depends on a parameter that controls the shape of the underlying gaussian model. In principle, different clusters of proximal particles can be identified, according to the value adopted for the parameter. This degree of freedom in gaussian potentials has been introduced with the goal of allowing multiresolution analysis. In particular, upon the adoption of a standard community detection algorithm, multiresolution analysis is put into practice by repeatedly running the algorithm on a set of adjacency matrices, each dependent on a specific value of the parameter that controls the shape of gaussian potentials. As a result, different partitioning schemas are obtained on the given dataset, so that the information thereof can be better highlighted, with the goal of identifying the most appropriate number of clusters. Solutions achieved in synthetic datasets allowed to identify a repetitive pattern, which appear to be useful in the task of identifying optimal solutions while analysing other synthetic and real datasets.
 Publication:

arXiv eprints
 Pub Date:
 January 2012
 arXiv:
 arXiv:1202.0077
 Bibcode:
 2012arXiv1202.0077A
 Keywords:

 Condensed Matter  Statistical Mechanics;
 Computer Science  Social and Information Networks;
 Physics  Physics and Society
 EPrint:
 13 pages, 5 figures. Submitted to ACS  Advances in Complex Systems