Finding community structure in very large networks
Abstract
The discovery and analysis of community structure in networks is a topic of considerable recent interest within the physics community, but most methods proposed so far are unsuitable for very large networks because of their computational cost. Here we present a hierarchical agglomeration algorithm for detecting community structure which is faster than many competing algorithms: its running time on a network with n vertices and m edges is O(mdlogn) where d is the depth of the dendrogram describing the community structure. Many realworld networks are sparse and hierarchical, with mtilde n and dtilde logn , in which case our algorithm runs in essentially linear time, O(nlog^{2}n) . As an example of the application of this algorithm we use it to analyze a network of items for sale on the web site of a large online retailer, items in the network being linked if they are frequently purchased by the same buyer. The network has more than 400 000 vertices and 2×10^{6} edges. We show that our algorithm can extract meaningful communities from this network, revealing largescale patterns present in the purchasing habits of customers.
 Publication:

Physical Review E
 Pub Date:
 December 2004
 DOI:
 10.1103/PhysRevE.70.066111
 arXiv:
 arXiv:condmat/0408187
 Bibcode:
 2004PhRvE..70f6111C
 Keywords:

 89.75.Hc;
 05.10.a;
 87.23.Ge;
 89.20.Hh;
 Networks and genealogical trees;
 Computational methods in statistical physics and nonlinear dynamics;
 Dynamics of social systems;
 World Wide Web Internet;
 Condensed Matter  Statistical Mechanics;
 Condensed Matter  Disordered Systems and Neural Networks
 EPrint:
 Phys. Rev. E 70, 066111 (2004)