A two-sample test for high-dimensional data with applications to gene-set testing
Abstract
We propose a two-sample test for the means of high-dimensional data when the data dimension is much larger than the sample size. Hotelling's classical $T^2$ test does not work for this "large $p$, small $n$" situation. The proposed test does not require explicit conditions in the relationship between the data dimension and sample size. This offers much flexibility in analyzing high-dimensional data. An application of the proposed test is in testing significance for sets of genes which we demonstrate in an empirical study on a leukemia data set.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2010
- DOI:
- 10.48550/arXiv.1002.4547
- arXiv:
- arXiv:1002.4547
- Bibcode:
- 2010arXiv1002.4547C
- Keywords:
-
- Mathematics - Statistics;
- 62H15;
- 60K35 (Primary) 62G10 (Secondary)
- E-Print:
- Published in at http://dx.doi.org/10.1214/09-AOS716 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org)