Controlling false discoveries in Bayesian gene networks with lasso regression p-values
Abstract
Bayesian networks can represent directed gene regulations and therefore are favored over co-expression networks. However, hardly any Bayesian network study concerns the false discovery control (FDC) of network edges, leading to low accuracies due to systematic biases from inconsistent false discovery levels in the same study. We design four empirical tests to examine the FDC of Bayesian networks from three p-value based lasso regression variable selections --- two existing and one we originate. Our method, lassopv, computes p-values for the critical regularization strength at which a predictor starts to contribute to lasso regression. Using null and Geuvadis datasets, we find that lassopv obtains optimal FDC in Bayesian gene networks, whilst existing methods have defective p-values. The FDC concept and tests extend to most network inference scenarios and will guide the design and improvement of new and existing methods. Our novel variable selection method with lasso regression also allows FDC on other datasets and questions, even beyond network inference and computational biology. Lassopv is implemented in R and freely available at https://github.com/lingfeiwang/lassopv and https://cran.r-project.org/package=lassopv
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2017
- DOI:
- 10.48550/arXiv.1701.07011
- arXiv:
- arXiv:1701.07011
- Bibcode:
- 2017arXiv170107011W
- Keywords:
-
- Statistics - Methodology;
- Quantitative Biology - Molecular Networks;
- Quantitative Biology - Quantitative Methods;
- Statistics - Applications
- E-Print:
- 9 pages, 6 figures, 3 tables. Supplementary info: 2 pages