High Dimensional Logistic Regression Under Network Dependence
Abstract
Logistic regression is one of the most fundamental methods for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure. This necessitates the development of models that can simultaneously handle both the network peereffect (arising from neighborhood interactions) and the effect of highdimensional covariates. In this paper, we develop a framework for incorporating such dependencies in a highdimensional logistic regression model by introducing a quadratic interaction term, as in the Ising model, designed to capture pairwise interactions from the underlying network. The resulting model can also be viewed as an Ising model, where the nodedependent external fields linearly encode the highdimensional covariates. We propose a penalized maximum pseudolikelihood method for estimating the network peereffect and the effect of the covariates, which, in addition to handling the highdimensionality of the parameters, conveniently avoids the computational intractability of the maximum likelihood approach. Consequently, our method is computationally efficient and, under various standard regularity conditions, our estimate attains the classical highdimensional rate of consistency. In particular, our results imply that even under network dependence it is possible to consistently estimate the model parameters at the same rate as in classical logistic regression, when the true parameter is sparse and the underlying network is not too dense. As a consequence of the general results, we derive the rates of consistency of our estimator for various natural graph ensembles, such as bounded degree graphs, sparse ErdősRényi random graphs, and stochastic block models.
 Publication:

arXiv eprints
 Pub Date:
 October 2021
 arXiv:
 arXiv:2110.03200
 Bibcode:
 2021arXiv211003200M
 Keywords:

 Mathematics  Statistics Theory;
 Statistics  Methodology
 EPrint:
 34 pages