Spectral redemption in clustering sparse networks
Abstract
Spectral algorithms are widely applied to data clustering problems, including finding communities or partitions in graphs and networks. We propose a way of encoding sparse data using a "nonbacktracking" matrix, and show that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model. This is in contrast with classical spectral algorithms, based on the adjacency matrix, random walk matrix, and graph Laplacian, which perform poorly in the sparse case, failing significantly above a recently discovered phase transition for the detectability of communities. Further support for the method is provided by experiments on real networks as well as by theoretical arguments and analogies from probability theory, statistical physics, and the theory of random matrices.
 Publication:

Proceedings of the National Academy of Science
 Pub Date:
 December 2013
 DOI:
 10.1073/pnas.1312486110
 arXiv:
 arXiv:1306.5550
 Bibcode:
 2013PNAS..11020935K
 Keywords:

 Computer Science  Social and Information Networks;
 Condensed Matter  Statistical Mechanics;
 Physics  Physics and Society;
 Statistics  Machine Learning
 EPrint:
 11 pages, 6 figures. Clarified to what extent our claims are rigorous, and to what extent they are conjectures