When to encourage using Gaussian regression for feature selection tasks with time-to-event outcome
Abstract
IMPORTANCE: Feature selection with respect to time-to-event outcomes is one of the fundamental problems in clinical trials and biomarker discovery studies. But it's unclear which statistical methods should be used when sample size is small or some of the key covariates are not measured. DESIGN: In this simulation study, the true models are multivariate Cox proportional hazards models with 10 covariates. It's assumed that only 5 out the 10 true features are observed/measured for all model fitting, along with 5 random noise features. Each sample size scenario is explored using 10,000 simulation datasets. Eight regression models are applied to each dataset to estimate feature effects, including both regularized Gaussian regression (elastic net penalty) and regularized Cox regression (glmnet Cox). RESULTS: If the covariates are highly correlated Gaussian, the Gaussian regression of log-transformed survival time with only two covariates outperforms all tested Cox regression models when total number of events <500.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2022
- DOI:
- arXiv:
- arXiv:2210.04409
- Bibcode:
- 2022arXiv221004409L
- Keywords:
-
- Statistics - Methodology
- E-Print:
- arXiv admin note: text overlap with arXiv:2208.09689