Computational Tools for Evaluating Phylogenetic and Hierarchical Clustering Trees
Abstract
Inferential summaries of tree estimates are useful in the setting of evolutionary biology, where phylogenetic trees have been built from DNA data since the 1960's. In bioinformatics, psychometrics and data mining, hierarchical clustering techniques output the same mathematical objects, and practitioners have similar questions about the stability and `generalizability' of these summaries. This paper provides an implementation of the geometric distance between trees developed by Billera, Holmes and Vogtmann (2001) [BHV] equally applicable to phylogenetic trees and hieirarchical clustering trees, and shows some of the applications in statistical inference for which this distance can be useful. In particular, since BHV have shown that the space of trees is negatively curved (a CAT(0) space), a natural representation of a collection of trees is a tree. We compare this representation to the Euclidean approximations of treespace made available through Multidimensional Scaling of the matrix of distances between trees. We also provide applications of the distances between trees to hierarchical clustering trees constructed from microarrays. Our method gives a new way of evaluating the influence both of certain columns (positions, variables or genes) and of certain rows (whether species, observations or arrays).
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2010
- DOI:
- arXiv:
- arXiv:1006.1015
- Bibcode:
- 2010arXiv1006.1015C
- Keywords:
-
- Statistics - Applications;
- Quantitative Biology - Populations and Evolution;
- Statistics - Computation;
- 62-09;
- 92-08
- E-Print:
- 25 pages, 14 figures