Pathway-based feature selection algorithms identify genes discriminating patients with multiple sclerosis apart from controls
Abstract
Introduction The focus of analyzing data from microarray experiments and extracting biological insight from such data has experienced a shift from identification of individual genes in association with a phenotype to that of biological pathways or gene sets. Meanwhile, feature selection algorithm becomes imperative to cope with the high dimensional nature of many modeling tasks in bioinformatics. Many feature selection algorithms use information contained within a gene set as a biological priori, and select relevant features by incorporating such information. Thus, an integration of gene set analysis with feature selection is highly desired. Significance analysis of microarray to gene-set reduction analysis (SAM-GSR) algorithm is a novel direction of gene set analysis, aiming at further reduction of gene set into a core subset. Here, we explore the feature selection trait possessed by SAM-GSR and then modify SAM-GSR specifically to better fulfill this role. Results and Conclusions Training on a multiple sclerosis (MS) microarray data using both SAM-GSR and our modification of SAM-GSR, excellent discriminative performance on an independent test set was achieved. To conclude, absorbing biological information from a gene set may be helpful for classification and feature selection. Discussion Given the fact the complete pathway information is far from completeness, a statistical method capable of constructing biologically meaningful gene networks is in demand. The basic requirement is that interplay among genes must be taken into account.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2015
- DOI:
- 10.48550/arXiv.1508.01509
- arXiv:
- arXiv:1508.01509
- Bibcode:
- 2015arXiv150801509Z
- Keywords:
-
- Quantitative Biology - Genomics