CoarseGrained Topology Estimation via Graph Sampling
Abstract
Many online networks are measured and studied via sampling techniques, which typically collect a relatively small fraction of nodes and their associated edges. Past work in this area has primarily focused on obtaining a representative sample of nodes and on efficient estimation of local graph properties (such as node degree distribution or any node attribute) based on that sample. However, less is known about estimating the global topology of the underlying graph. In this paper, we show how to efficiently estimate the coarsegrained topology of a graph from a probability sample of nodes. In particular, we consider that nodes are partitioned into categories (e.g., countries or work/study places in OSNs), which naturally defines a weighted category graph. We are interested in estimating (i) the size of categories and (ii) the probability that nodes from two different categories are connected. For each of the above, we develop a family of estimators for designbased inference under uniform or nonuniform sampling, employing either of two measurement strategies: induced subgraph sampling, which relies only on information about the sampled nodes; and star sampling, which also exploits category information about the neighbors of sampled nodes. We prove consistency of these estimators and evaluate their efficiency via simulation on fully known graphs. We also apply our methodology to a sample of Facebook users to obtain a number of category graphs, such as the college friendship graph and the country friendship graph; we share and visualize the resulting data at www.geosocialmap.com.
 Publication:

arXiv eprints
 Pub Date:
 May 2011
 arXiv:
 arXiv:1105.5488
 Bibcode:
 2011arXiv1105.5488K
 Keywords:

 Computer Science  Social and Information Networks;
 Physics  Data Analysis;
 Statistics and Probability;
 Physics  Physics and Society