Model selection for degreecorrected block models
Abstract
The proliferation of models for networks raises challenging problems of model selection: the data are sparse and globally dependent, and models are typically highdimensional and have large numbers of latent variables. Together, these issues mean that the usual modelselection criteria do not work properly for networks. We illustrate these challenges, and show one way to resolve them, by considering the key networkanalysis problem of dividing a graph into communities or blocks of nodes with homogeneous patterns of links to the rest of the network. The standard tool for undertaking this is the stochastic block model, under which the probability of a link between two nodes is a function solely of the blocks to which they belong. This imposes a homogeneous degree distribution within each block; this can be unrealistic, so degreecorrected block models add a parameter for each node, modulating its overall degree. The choice between ordinary and degreecorrected block models matters because they make very different inferences about communities. We present the first principled and tractable approach to model selection between standard and degreecorrected block models, based on new largegraph asymptotics for the distribution of loglikelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs. We also develop lineartime approximations for loglikelihoods under both the stochastic block model and the degreecorrected model, using belief propagation. Applications to simulated and real networks show excellent agreement with our approximations. Our results thus both solve the practical problem of deciding on degree correction and point to a general approach to model selection in network analysis.
 Publication:

Journal of Statistical Mechanics: Theory and Experiment
 Pub Date:
 May 2014
 DOI:
 10.1088/17425468/2014/05/P05007
 arXiv:
 arXiv:1207.3994
 Bibcode:
 2014JSMTE..05..007Y
 Keywords:

 Computer Science  Social and Information Networks;
 Condensed Matter  Statistical Mechanics;
 Mathematics  Statistics Theory;
 Physics  Physics and Society;
 Statistics  Machine Learning
 EPrint:
 J. Stat. Mech. (2014) P05007