Online Debiased Lasso for Streaming Data
Abstract
We propose an online debiased lasso (ODL) method for statistical inference in high-dimensional linear models with streaming data. The proposed ODL consists of an efficient computational algorithm for streaming data and approximately normal estimators for the regression coefficients. Its implementation only requires the availability of the current data batch in the data stream and sufficient statistics of the historical data at each stage of the analysis. A dynamic procedure is developed to select and update the tuning parameters upon the arrival of each new data batch so that we can adjust the amount of regularization adaptively along the data stream. The asymptotic normality of the ODL estimator is established under the conditions similar to those in an offline setting and mild conditions on the size of data batches in the stream, which provides theoretical justification for the proposed online statistical inference procedure. We conduct extensive numerical experiments to evaluate the performance of ODL. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. An air quality dataset and an index fund dataset from Hong Kong Stock Exchange are analyzed to illustrate the application of the proposed method.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2021
- DOI:
- arXiv:
- arXiv:2106.05925
- Bibcode:
- 2021arXiv210605925H
- Keywords:
-
- Mathematics - Statistics Theory
- E-Print:
- Ruijian Han and Lan Luo contributed equally to this work. Co-corresponding authors: Yuanyuan Lin (Email: ylin@sta.cuhk.edu.hk) and Jian Huang (Email: jian-huang@uiowa.edu)