A synopsis of comparative metrics for classifications
Abstract
Phylogeny is the study of the relations between biological entities. From it, the need to compare tree-like graphs has risen and several metrics were established and researched, but since there is no definitive way to compare them, its discussion is still open nowadays. All of them emphasize different features of the structures and, of course, the efficiency of these computations also varies. The work in this article is mainly expositive (a lifting from a collection of papers and articles) with special care in its presentation (trying to mathematically formalize what was not presented that way previously) and filling (with original work) where information was not available (or at least, to our knowledge) given the frame we set to fit these metrics, which was to state their discriminative power and time complexity. The Robinson Foulds, Robinson Foulds Length, Quartet, Triplet, Triplet Length, Geodesic metrics are approached with greater detail (stating also some of its problems in formulation and discussing its intricacies) but the reader can also expect that less used (but not necessarily less important or less promising) metrics will be covered, which are Maximum Aggreement Subtree, Align, Cophenetic Correlation Coeficcient, Node, Similarity Based on Probability, Hybridization Number and Subtree Prune and Regraft. Finally, some challenges that sprouted from making this synopsys are presented as a possible subject of study and research.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2018
- DOI:
- 10.48550/arXiv.1804.03929
- arXiv:
- arXiv:1804.03929
- Bibcode:
- 2018arXiv180403929L
- Keywords:
-
- Computer Science - Data Structures and Algorithms;
- Quantitative Biology - Quantitative Methods;
- 68W01 (Primary) 92B10 (Secondary)
- E-Print:
- 37 pages, 13 figures. Part of author's MSc thesis