Exact recovery and sharp thresholds of Stochastic Ising Block Model
Abstract
The stochastic block model (SBM) is a random graph model in which the edges are generated according to the underlying cluster structure on the vertices. The (ferromagnetic) Ising model, on the other hand, assigns $\pm 1$ labels to vertices according to an underlying graph structure in a way that if two vertices are connected in the graph then they are more likely to be assigned the same label. In SBM, one aims to recover the underlying clusters from the graph structure while in Ising model, an extensively-studied problem is to recover the underlying graph structure based on i.i.d. samples (labelings of the vertices). In this paper, we propose a natural composition of SBM and the Ising model, which we call the Stochastic Ising Block Model (SIBM). In SIBM, we take SBM in its simplest form, where $n$ vertices are divided into two equal-sized clusters and the edges are connected independently with probability $p$ within clusters and $q$ across clusters. Then we use the graph $G$ generated by the SBM as the underlying graph of the Ising model and draw $m$ i.i.d. samples from it. The objective is to exactly recover the two clusters in SBM from the samples generated by the Ising model, without observing the graph $G$. As the main result of this paper, we establish a sharp threshold $m^\ast$ on the sample complexity of this exact recovery problem in a properly chosen regime, where $m^\ast$ can be calculated from the parameters of SIBM. We show that when $m\ge m^\ast$, one can recover the clusters from $m$ samples in $O(n)$ time as the number of vertices $n$ goes to infinity. When $m<m^\ast$, we further show that for almost all choices of parameters of SIBM, the success probability of any recovery algorithms approaches $0$ as $n\to\infty$.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2020
- DOI:
- 10.48550/arXiv.2004.05944
- arXiv:
- arXiv:2004.05944
- Bibcode:
- 2020arXiv200405944Y
- Keywords:
-
- Mathematics - Probability;
- Computer Science - Information Theory;
- Statistics - Machine Learning
- E-Print:
- Fixed a gap in the original proof of Theorem 5. The new proof of Theorem 5 relies on Lemma 5, which is the main new element in this version