Adaptive Noisy Data Augmentation for Regularized Estimation and Inference in Generalized Linear Models
Abstract
We propose the AdaPtive Noise Augmentation (PANDA) procedure to regularize the estimation and inference of generalized linear models (GLMs). PANDA iteratively optimizes the objective function given noise augmented data until convergence to obtain the regularized model estimates. The augmented noises are designed to achieve various regularization effects, including $l_0$, bridge (lasso and ridge included), elastic net, adaptive lasso, and SCAD, as well as group lasso and fused ridge. We examine the tail bound of the noise-augmented loss function and establish the almost sure convergence of the noise-augmented loss function and its minimizer to the expected penalized loss function and its minimizer, respectively. We derive the asymptotic distributions for the regularized parameters, based on which, inferences can be obtained simultaneously with variable selection. PANDA exhibits ensemble learning behaviors that help further decrease the generalization error. Computationally, PANDA is easy to code, leveraging existing software for implementing GLMs, without resorting to complicated optimization techniques. We demonstrate the superior or similar performance of PANDA against the existing approaches of the same type of regularizers in simulated and real-life data. We show that the inferences through PANDA achieve nominal or near-nominal coverage and are far more efficient compared to a popular existing post-selection procedure.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2022
- DOI:
- arXiv:
- arXiv:2204.08574
- Bibcode:
- 2022arXiv220408574L
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning;
- Statistics - Methodology
- E-Print:
- Accepted by IEEE-COMPSAC 2022 Computers, Software &