Read classification using semi-supervised deep learning
Abstract
In this paper, we propose a semi-supervised deep learning method for detecting the specific types of reads that impede the de novo genome assembly process. Instead of dealing directly with sequenced reads, we analyze their coverage graphs converted to 1D-signals. We noticed that specific signal patterns occur in each relevant class of reads. Semi-supervised approach is chosen because manually labelling the data is a very slow and tedious process, so our goal was to facilitate the assembly process with as little labeled data as possible. We tested two models to learn patterns in the coverage graphs: M1+M2 and semi-GAN. We evaluated the performance of each model based on a manually labeled dataset that comprises various reads from multiple reference genomes with respect to the number of labeled examples that were used during the training process. In addition, we embedded our detection in the assembly process which improved the quality of assemblies.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- 10.48550/arXiv.1904.10353
- arXiv:
- arXiv:1904.10353
- Bibcode:
- 2019arXiv190410353S
- Keywords:
-
- Computer Science - Machine Learning;
- Quantitative Biology - Genomics;
- Statistics - Machine Learning
- E-Print:
- 2nd International Workshop on Deep Learning for Precision Medicine, ECML-PKDD, 2017, Skopje, Nothern Macedonia