Community Recovery in Hypergraphs
Abstract
Community recovery is a central problem that arises in a wide variety of applications such as network clustering, motion segmentation, face clustering and protein complex detection. The objective of the problem is to cluster data points into distinct communities based on a set of measurements, each of which is associated with the values of a certain number of data points. While most of the prior works focus on a setting in which the number of data points involved in a measurement is two, this work explores a generalized setting in which the number can be more than two. Motivated by applications particularly in machine learning and channel coding, we consider two types of measurements: (1) homogeneity measurement which indicates whether or not the associated data points belong to the same community; (2) parity measurement which denotes the modulo-2 sum of the values of the data points. Such measurements are possibly corrupted by Bernoulli noise. We characterize the fundamental limits on the number of measurements required to reconstruct the communities for the considered models.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2017
- DOI:
- 10.48550/arXiv.1709.03670
- arXiv:
- arXiv:1709.03670
- Bibcode:
- 2017arXiv170903670A
- Keywords:
-
- Computer Science - Information Theory;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 25 pages, 7 figures. Submitted to IEEE Transacations on Information Theory