A statistical interpretation of spectral embedding: the generalised random dot product graph
Abstract
A generalisation of a latent position network model known as the random dot product graph is considered. We show that, whether the normalised Laplacian or adjacency matrix is used, the vector representations of nodes obtained by spectral embedding, using the largest eigenvalues by magnitude, provide strongly consistent latent position estimates with asymptotically Gaussian error, up to indefinite orthogonal transformation. The mixed membership and standard stochastic block models constitute special cases where the latent positions live respectively inside or on the vertices of a simplex, crucially, without assuming the underlying block connectivity probability matrix is positivedefinite. Estimation via spectral embedding can therefore be achieved by respectively estimating this simplicial support, or fitting a Gaussian mixture model. In the latter case, the use of $K$means (with Euclidean distance), as has been previously recommended, is suboptimal and for identifiability reasons unsound. Indeed, Euclidean distances and angles are not preserved under indefinite orthogonal transformation, and we show stochastic block model examples where such quantities vary appreciably. Empirical improvements in link prediction (over the random dot product graph), as well as the potential to uncover richer latent structure (than posited under the mixed membership or standard stochastic block models) are demonstrated in a cybersecurity example.
 Publication:

arXiv eprints
 Pub Date:
 September 2017
 arXiv:
 arXiv:1709.05506
 Bibcode:
 2017arXiv170905506R
 Keywords:

 Statistics  Machine Learning;
 Computer Science  Machine Learning
 EPrint:
 30 pages