A Sparse Random Graph Model for Sparse Directed Networks
Abstract
An increasingly urgent task in analysis of networks is to develop statistical models that include contextual information in the form of covariates while respecting degree heterogeneity and sparsity. In this paper, we propose a new parametersparse random graph model for densitysparse directed networks, with parameters to explicitly account for all these features. The resulting objective function of our model is akin to that of the highdimensional logistic regression, with the key difference that the probabilities are allowed to go to zero at a certain rate to accommodate sparse networks. We show that under appropriate conditions, an estimator obtained by the familiar penalized likelihood with an $\ell_1$ penalty to achieve parameter sparsity can alleviate the curse of dimensionality, and crucially is selection and rate consistent. Interestingly, inference on the covariate parameter can be conducted straightforwardly after the model fitting, without the need of the kind of debiasing commonly employed in $\ell_1$ penalized likelihood estimation. Simulation and data analysis corroborate our theoretical findings. In developing our model, we provide the first result highlighting the fallacy of what we call dataselective inference, a common practice of artificially truncating the sample by throwing away nodes based on their connections, by examining the estimation bias in the ErdösRényi model theoretically and in the stochastic block model empirically.
 Publication:

arXiv eprints
 Pub Date:
 August 2021
 arXiv:
 arXiv:2108.09504
 Bibcode:
 2021arXiv210809504S
 Keywords:

 Mathematics  Statistics Theory;
 Statistics  Applications;
 Statistics  Methodology
 EPrint:
 64 pages, 5 figures, 4 tables. arXiv admin note: text overlap with arXiv:2010.13604