Scalable and Robust Community Detection with Randomized Sketching
Abstract
This article explores and analyzes the unsupervised clustering of large partially observed graphs. We propose a scalable and provable randomized framework for clustering graphs generated from the stochastic block model. The clustering is first applied to a sub-matrix of the graph's adjacency matrix associated with a reduced graph sketch constructed using random sampling. Then, the clusters of the full graph are inferred based on the clusters extracted from the sketch using a correlation-based retrieval step. Uniform random node sampling is shown to improve the computational complexity over clustering of the full graph when the cluster sizes are balanced. A new random degree-based node sampling algorithm is presented which significantly improves upon the performance of the clustering algorithm even when clusters are unbalanced. This framework improves the phase transitions for matrix-decomposition-based clustering with regard to computational complexity and minimum cluster size, which are shown to be nearly dimension-free in the low inter-cluster connectivity regime. A third sampling technique is shown to improve balance by randomly sampling nodes based on spatial distribution. We provide analysis and numerical results using a convex clustering algorithm based on matrix completion.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2018
- DOI:
- 10.48550/arXiv.1805.10927
- arXiv:
- arXiv:1805.10927
- Bibcode:
- 2018arXiv180510927R
- Keywords:
-
- Computer Science - Social and Information Networks;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- IEEE Transactions on Signal Processing, vol. 68, pp. 962-977, 2020