Optimal Sub-sampling with Influence Functions
Abstract
Sub-sampling is a common and often effective method to deal with the computational challenges of large datasets. However, for most statistical models, there is no well-motivated approach for drawing a non-uniform subsample. We show that the concept of an asymptotically linear estimator and the associated influence function leads to optimal sampling procedures for a wide class of popular models. Furthermore, for linear regression models which have well-studied procedures for non-uniform sub-sampling, we show our optimal influence function based method outperforms previous approaches. We empirically show the improved performance of our method on real datasets.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2017
- DOI:
- 10.48550/arXiv.1709.01716
- arXiv:
- arXiv:1709.01716
- Bibcode:
- 2017arXiv170901716T
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning