Spectral redemption in clustering sparse networks
Abstract
Spectral algorithms are widely applied to data clustering problems, including finding communities or partitions in graphs and networks. We propose a way of encoding sparse data using a "nonbacktracking" matrix, and show that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model. This is in contrast with classical spectral algorithms, based on the adjacency matrix, random walk matrix, and graph Laplacian, which perform poorly in the sparse case, failing significantly above a recently discovered phase transition for the detectability of communities. Further support for the method is provided by experiments on real networks as well as by theoretical arguments and analogies from probability theory, statistical physics, and the theory of random matrices.
- Publication:
-
Proceedings of the National Academy of Science
- Pub Date:
- December 2013
- DOI:
- 10.1073/pnas.1312486110
- arXiv:
- arXiv:1306.5550
- Bibcode:
- 2013PNAS..11020935K
- Keywords:
-
- Computer Science - Social and Information Networks;
- Condensed Matter - Statistical Mechanics;
- Physics - Physics and Society;
- Statistics - Machine Learning
- E-Print:
- 11 pages, 6 figures. Clarified to what extent our claims are rigorous, and to what extent they are conjectures