Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

doi:10.48550/arXiv.2001.06194

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Distributed statistical inference has recently attracted immense attention. The asymptotic efficiency of the maximum likelihood estimator (MLE), the one-step MLE, and the aggregated estimating equation estimator are established for generalized linear models under the "large $n$, diverging $p_n$" framework, where the dimension of the covariates $p_n$ grows to infinity at a polynomial rate $o(n^\alpha)$ for some $0<\alpha<1$. Then a novel method is proposed to obtain an asymptotically efficient estimator for large-scale distributed data by two rounds of communication. In this novel method, the assumption on the number of servers is more relaxed and thus practical for real-world applications. Simulations and a case study demonstrate the satisfactory finite-sample performance of the proposed estimators.

Publication:

arXiv e-prints

Pub Date:

January 2020

DOI:

10.48550/arXiv.2001.06194

arXiv:

arXiv:2001.06194

Bibcode:

2020arXiv200106194Z

Keywords:

Statistics - Methodology;
Computer Science - Distributed;
Parallel;
and Cluster Computing;
Computer Science - Machine Learning;
Statistics - Machine Learning

NASA/ADS

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Abstract