Ensemble gene selection for cancer classification
Abstract
Cancer diagnosis is an important emerging clinical application of microarray data. Its accurate prediction to the type or size of tumors relies on adopting powerful and reliable classification models, so as to patients can be provided with better treatment or response to therapy. However, the high dimensionality of microarray data may bring some disadvantages, such as over-fitting, poor performance and low efficiency, to traditional classification models. Thus, one of the challenging tasks in cancer diagnosis is how to identify salient expression genes from thousands of genes in microarray data that can directly contribute to the phenotype or symptom of disease. In this paper, we propose a new ensemble gene selection method (EGS) to choose multiple gene subsets for classification purpose, where the significant degree of gene is measured by conditional mutual information or its normalized form. After different gene subsets have been obtained by setting different starting points of the search procedure, they will be used to train multiple base classifiers and then aggregated into a consensus classifier by the manner of majority voting. The proposed method is compared with five popular gene selection methods on six public microarray datasets and the comparison results show that our method works well.
- Publication:
-
Pattern Recognition
- Pub Date:
- 2010
- DOI:
- Bibcode:
- 2010PatRe..43.2763L
- Keywords:
-
- Gene selection;
- Ensemble learning;
- Cancer classification;
- Mutual information;
- Microarray data