Sparse graphs using exchangeable random measures
Abstract
Statistical network modeling has focused on representing the graph as a discrete structure, namely the adjacency matrix, and considering the exchangeability of this array. In such cases, the AldousHoover representation theorem (Aldous, 1981;Hoover, 1979} applies and informs us that the graph is necessarily either dense or empty. In this paper, we instead consider representing the graph as a measure on $\mathbb{R}_+^2$. For the associated definition of exchangeability in this continuous space, we rely on the Kallenberg representation theorem (Kallenberg, 2005). We show that for certain choices of such exchangeable random measures underlying our graph construction, our network process is sparse with powerlaw degree distribution. In particular, we build on the framework of completely random measures (CRMs) and use the theory associated with such processes to derive important network properties, such as an urn representation for our analysis and network simulation. Our theoretical results are explored empirically and compared to common network models. We then present a Hamiltonian Monte Carlo algorithm for efficient exploration of the posterior distribution and demonstrate that we are able to recover graphs ranging from dense to sparseand perform associated testsbased on our flexible CRMbased formulation. We explore network properties in a range of real datasets, including Facebook social circles, a political blogosphere, protein networks, citation networks, and world wide web networks, including networks with hundreds of thousands of nodes and millions of edges.
 Publication:

arXiv eprints
 Pub Date:
 January 2014
 arXiv:
 arXiv:1401.1137
 Bibcode:
 2014arXiv1401.1137C
 Keywords:

 Statistics  Methodology;
 Computer Science  Social and Information Networks;
 Mathematics  Statistics Theory;
 Statistics  Machine Learning
 EPrint:
 New title. Extended version