Rank-based linkage I: triplet comparisons and oriented simplicial complexes
Abstract
Rank-based linkage is a new tool for summarizing a collection $S$ of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on $S$. Rank-based linkage is applied to the $K$-nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected $K$-nearest neighbor graph on $S$. In $|S| K^2$ steps it builds an edge-weighted linkage graph $(S, \mathcal{L}, \sigma)$ where $\sigma(\{x, y\})$ is called the in-sway between objects $x$ and $y$. Take $\mathcal{L}_t$ to be the links whose in-sway is at least $t$, and partition $S$ into components of the graph $(S, \mathcal{L}_t)$, for varying $t$. Rank-based linkage is a functor from a category of out-ordered digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart`` the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Open combinatorial problems are presented in the last section.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2023
- DOI:
- 10.48550/arXiv.2302.02200
- arXiv:
- arXiv:2302.02200
- Bibcode:
- 2023arXiv230202200D
- Keywords:
-
- Mathematics - Combinatorics;
- Mathematics - Statistics Theory;
- 62H30 (Primary) 05C20;
- 05E45;
- 05C76 (Secondary);
- G.4
- E-Print:
- 37 pages, 12 figures