Distributions and Statistical Power of Optimal Signal-Detection Methods In Finite Cases
Abstract
In big data analysis for detecting rare and weak signals among $n$ features, some grouping-test methods such as Higher Criticism test (HC), Berk-Jones test (B-J), and $\phi$-divergence test share the similar asymptotical optimality when $n \rightarrow \infty$. However, in practical data analysis $n$ is frequently small and moderately large at most. In order to properly apply these optimal tests and wisely choose them for practical studies, it is important to know how to get the p-values and statistical power of them. To address this problem in an even broader context, this paper provides analytical solutions for a general family of goodness-of-fit (GOF) tests, which covers these optimal tests. For any given i.i.d. and continuous distributions of the input test statistics of the $n$ features, both p-value and statistical power of such a GOF test can be calculated. By calculation we compared the finite-sample performances of asymptotically optimal tests under the normal mixture alternative. Results show that HC is the best choice when signals are rare, while B-J is more robust over various signal patterns. In the application to a real genome-wide association study, results illustrate that the p-value calculation works well, and the optimal tests have potentials for detecting novel disease genes with weak genetic effects. The calculations have been implemented in an R package SetTest and published on the CRAN.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2017
- DOI:
- 10.48550/arXiv.1702.07082
- arXiv:
- arXiv:1702.07082
- Bibcode:
- 2017arXiv170207082Z
- Keywords:
-
- Mathematics - Statistics Theory
- E-Print:
- 37 pages, 9 figures, 1 table