Change Point Analysis of Multivariate Data: Using Multivariate Rank-based Distribution-free Nonparametric Testing via Measure Transportation with Applications in Tumor Microarrays and Dementia
Abstract
In this paper, I propose a general algorithm for multiple change point analysis via multivariate distribution-free nonparametric testing based on the concept of ranks that are defined by measure transportation. Multivariate ranks and the usual one-dimensional ranks both share an important property: they are both distribution-free. This finding allows for the creation of nonparametric tests that are distribution-free under the null hypothesis. This method has applications in a variety of fields, and in this paper I implement this algorithm to a microarray dataset for individuals with bladder tumors, an ECoG snapshot for a patient with epilepsy, and in the context of trajectories of CASI scores by education level and dementia status. Each change point denotes a shift in the rate of change of Cognitive Abilities score over years, indicating the existence of preclinical dementia. Here I will estimate the number of change points and each of their locations within a multivariate series of time-ordered observations. This paper will examine the multiple change point question in a broad setting in which the observed distributions and number of change points are unspecified, rather than assume the time series observations follow a parametric model or there is one change point, as many works in this area assume. The objective here is to create an algorithm for change point detection while making as few assumptions about the dataset as possible. Presented are the theoretical properties of this new algorithm and the conditions under which the approximate number of change points and their locations can be estimated. This algorithm has also been successfully implemented in the R package recp, which is available on GitHub. A section of this paper is dedicated to the execution of this procedure, as well as the use of the recp package.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2021
- DOI:
- arXiv:
- arXiv:2108.05979
- Bibcode:
- 2021arXiv210805979N
- Keywords:
-
- Statistics - Methodology;
- 62G05;
- 62G10;
- 62G30
- E-Print:
- 20 pages and 4 figures