Estimating the number of clusters of a Block Markov Chain
Abstract
Clustering algorithms frequently require the number of clusters to be chosen in advance, but it is usually not clear how to do this. To tackle this challenge when clustering within sequential data, we present a method for estimating the number of clusters when the data is a trajectory of a Block Markov Chain. Block Markov Chains are Markov Chains that exhibit a block structure in their transition matrix. The method considers a matrix that counts the number of transitions between different states within the trajectory, and transforms this into a spectral embedding whose dimension is set via singular value thresholding. The number of clusters is subsequently estimated via density-based clustering of this spectral embedding, an approach inspired by literature on the Stochastic Block Model. By leveraging and augmenting recent results on the spectral concentration of random matrices with Markovian dependence, we show that the method is asymptotically consistent - in spite of the dependencies between the count matrix's entries, and even when the count matrix is sparse. We also present a numerical evaluation of our method, and compare it to alternatives.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- 10.48550/arXiv.2407.18287
- arXiv:
- arXiv:2407.18287
- Bibcode:
- 2024arXiv240718287V
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning;
- Mathematics - Probability;
- 62H30;
- 60J10;
- 60B20;
- 60J20
- E-Print:
- 46 pages, 13 figures, 6 tables, 7 algorithms