Smoothing spline ANOVA for super-large samples: Scalable computation via rounding parameters
Abstract
In the current era of big data, researchers routinely collect and analyze data of super-large sample sizes. Data-oriented statistical methods have been developed to extract information from super-large data. Smoothing spline ANOVA (SSANOVA) is a promising approach for extracting information from noisy data; however, the heavy computational cost of SSANOVA hinders its wide application. In this paper, we propose a new algorithm for fitting SSANOVA models to super-large sample data. In this algorithm, we introduce rounding parameters to make the computation scalable. To demonstrate the benefits of the rounding parameters, we present a simulation study and a real data example using electroencephalography data. Our results reveal that (using the rounding parameters) a researcher can fit nonparametric regression models to very large samples within a few seconds using a standard laptop or tablet computer.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2016
- DOI:
- 10.48550/arXiv.1602.05208
- arXiv:
- arXiv:1602.05208
- Bibcode:
- 2016arXiv160205208H
- Keywords:
-
- Statistics - Computation
- E-Print:
- 22 pages, 7 figures