Nonparametric Clustering of Functional Data Using Pseudo-Densities
Abstract
We study nonparametric clustering of smooth random curves on the basis of the L2 gradient flow associated to a pseudo-density functional and we show that the clustering is well-defined both at the population and at the sample level. We provide an algorithm to mark significant local modes, which are associated to informative sample clusters, and we derive its consistency properties. Our theory is developed under weak assumptions, which essentially reduce to the integrability of the random curves, and does not require to project the random curves on a finite-dimensional subspace. However, if the underlying probability distribution is supported on a finite-dimensional subspace, we show that the pseudo-density and the expectation of a kernel density estimator induce the same gradient flow, and therefore the same clustering. Although our theory is developed for smooth curves that belong to an infinite-dimensional functional space, we also provide consistent procedures that can be used with real data (discretized and noisy observations).
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2016
- DOI:
- 10.48550/arXiv.1601.07872
- arXiv:
- arXiv:1601.07872
- Bibcode:
- 2016arXiv160107872C
- Keywords:
-
- Mathematics - Statistics Theory;
- Statistics - Methodology
- E-Print:
- Electron. J. Statist., Volume 10, Number 2 (2016), 2922-2972