Effective Resistance-based Germination of Seed Sets for Community Detection
Abstract
Community detection is, at its core, an attempt to attach an interpretable function to an otherwise indecipherable form. The importance of labeling communities has obvious implications for identifying clusters in social networks, but it has a number of equally relevant applications in product recommendations, biological systems, and many forms of classification. The local variety of community detection starts with a small set of labeled seed nodes, and aims to estimate the community containing these nodes. One of the most ubiquitous methods - due to its simplicity and efficiency - is personalized PageRank. The most obvious bottleneck for deploying this form of PageRank successfully is the quality of the seeds. We introduce a "germination" stage for these seeds, where an effective resistance-based approach is used to increase the quality and number of seeds from which a community is detected. By breaking seed set expansion into a two-step process, we aim to utilize two distinct random walk-based approaches in the regimes in which they excel. In synthetic and real network data, a simple, greedy algorithm which minimizes the effective resistance diameter combined with PageRank achieves clear improvements in precision and recall over a standalone PageRank procedure.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2018
- DOI:
- 10.48550/arXiv.1811.12162
- arXiv:
- arXiv:1811.12162
- Bibcode:
- 2018arXiv181112162E
- Keywords:
-
- Computer Science - Social and Information Networks;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 10 pages, 4 figures, currently under review for conference submission