A Semi-Supervised Generative Adversarial Network for Prediction of Genetic Disease Outcomes
Abstract
For most diseases, building large databases of labeled genetic data is an expensive and time-demanding task. To address this, we introduce genetic Generative Adversarial Networks (gGAN), a semi-supervised approach based on an innovative GAN architecture to create large synthetic genetic data sets starting with a small amount of labeled data and a large amount of unlabeled data. Our goal is to determine the propensity of a new individual to develop the severe form of the illness from their genetic profile alone. The proposed model achieved satisfactory results using real genetic data from different datasets and populations, in which the test populations may not have the same genetic profiles. The proposed model is self-aware and capable of determining whether a new genetic profile has enough compatibility with the data on which the network was trained and is thus suitable for prediction. The code and datasets used can be found at https://github.com/caio-davi/gGAN.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2020
- DOI:
- 10.48550/arXiv.2007.01200
- arXiv:
- arXiv:2007.01200
- Bibcode:
- 2020arXiv200701200D
- Keywords:
-
- Computer Science - Machine Learning;
- Quantitative Biology - Genomics;
- Statistics - Machine Learning;
- I.5