Generalized kMeans in GLMs with Applications to the Outbreak of COVID19 in the United States
Abstract
Generalized $k$means can be incorporated with any similarity or dissimilarity measure for clustering. By choosing the dissimilarity measure as the well known likelihood ratio or $F$statistic, this work proposes a method based on generalized $k$means to group statistical models. Given the number of clusters $k$, the method is established under hypothesis tests between statistical models. If $k$ is unknown, then the method can be combined with GIC to automatically select the best $k$ for clustering. The article investigates both AIC and BIC as the special cases. Theoretical and simulation results show that the number of clusters can be identified by BIC but not AIC. The resulting method for GLMs is used to group the statelevel time series patterns for the outbreak of COVID19 in the United States. A further study shows that the statistical models between the clusters are significantly different from each other. This study confirms the result given by the proposed method based on generalized $k$means.
 Publication:

arXiv eprints
 Pub Date:
 August 2020
 DOI:
 10.48550/arXiv.2008.03838
 arXiv:
 arXiv:2008.03838
 Bibcode:
 2020arXiv200803838Z
 Keywords:

 Statistics  Methodology;
 62H30;
 62J12