A framework for large scale phylogenetic analysis
Abstract
With growing exchanges of people and merchandise between countries, epidemics have become an issue of increasing importance and huge amounts of data are being collected every day. Hence, analyses that were usually run in personal computers and desktops are no longer feasible. It is now common to run such tasks in High-performance computing (HPC) environments and/or dedicated systems. On the other hand we are often dealing in these analyses with graphs and trees, and running algorithms to find patterns in such structures. Hence, although graph oriented databases and processing systems can be of much help in this setting, as far as we know there is no solution relying on these technologies to address large scale phylogenetic analysis challenges. This project aims to develop a modular framework for large scale phylogenetic analysis that exploits such technologies, namely Neo4j. We address this challenge by proposing and developing a framework which allows representing large phylogenetic networks and trees, as well as ancillary data, that supports queries on such data, and allows the deployment of algorithms for inferring/detecting patterns and pre-computing visualizations, as a Neo4j plugin. This framework is innovative and brings several advantages to the phylogenetic analysis, such as by storing the phylogenetic trees will avoid having to compute them again, and by using multilayer networks will make the comparison between them more efficient and scalable. Experimental results showcase that it can be very efficient in the mostly used operations and that the supported algorithms comply with their time complexity.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2020
- DOI:
- 10.48550/arXiv.2012.13363
- arXiv:
- arXiv:2012.13363
- Bibcode:
- 2020arXiv201213363L
- Keywords:
-
- Quantitative Biology - Populations and Evolution