The Central Spanning Tree Problem
Abstract
Spanning trees are an important primitive in many data analysis tasks, when a data set needs to be summarized in terms of its "skeleton", or when a treeshaped graph over all observations is required for downstream processing. Popular definitions of spanning trees include the minimum spanning tree and the optimum distance spanning tree, a.k.a. the minimum routing cost tree. When searching for the shortest spanning tree but admitting additional branching points, even shorter spanning trees can be realized: Steiner trees. Unfortunately, both minimum spanning and Steiner trees are not robust with respect to noise in the observations; that is, small perturbations of the original data set often lead to drastic changes in the associated spanning trees. In response, we make two contributions when the data lies in a Euclidean space: on the theoretical side, we introduce a new optimization problem, the "(branched) central spanning tree", which subsumes all previously mentioned definitions as special cases. On the practical side, we show empirically that the (branched) central spanning tree is more robust to noise in the data, and as such is better suited to summarize a data set in terms of its skeleton. We also propose a heuristic to address the NPhard optimization problem, and illustrate its use on single cell RNA expression data from biology and 3D point clouds of plants.
 Publication:

arXiv eprints
 Pub Date:
 April 2024
 DOI:
 10.48550/arXiv.2404.06447
 arXiv:
 arXiv:2404.06447
 Bibcode:
 2024arXiv240406447F
 Keywords:

 Computer Science  Discrete Mathematics;
 Computer Science  Computer Vision and Pattern Recognition;
 Computer Science  Data Structures and Algorithms;
 Mathematics  Combinatorics;
 Mathematics  Optimization and Control