Segmentation of high dimensional means over multi-dimensional change points and connections to regression trees
Abstract
This article is motivated by the objective of providing a new analytically tractable and fully frequentist framework to characterize and implement regression trees while also allowing a multivariate (potentially high dimensional) response. The connection to regression trees is made by a high dimensional model with dynamic mean vectors over multi-dimensional change axes. Our theoretical analysis is carried out under a single two dimensional change point setting. An optimal rate of convergence of the proposed estimator is obtained, which in turn allows existence of limiting distributions. Distributional behavior of change point estimates are split into two distinct regimes, the limiting distributions under each regime is then characterized, in turn allowing construction of asymptotically valid confidence intervals for $2d$-location of change. All results are obtained under a high dimensional scaling $s\log^2 p=o(T_wT_h),$ where $p$ is the response dimension, $s$ is a sparsity parameter, and $T_w,T_h$ are sampling periods along change axes. We characterize full regression trees by defining a multiple multi-dimensional change point model. Natural extensions of the single $2d$-change point estimation methodology are provided. Two applications, first on segmentation of {\it Infra-red astronomy satellite (IRAS)} data and second to segmentation of digital images are provided. Methodology and theoretical results are supported with monte-carlo simulations.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2021
- DOI:
- 10.48550/arXiv.2105.10017
- arXiv:
- arXiv:2105.10017
- Bibcode:
- 2021arXiv210510017K
- Keywords:
-
- Statistics - Methodology;
- Mathematics - Statistics Theory;
- Statistics - Machine Learning
- E-Print:
- All implementations carried out in R (code available upon request)