Automated Selection of Post-Strata using a Model-Assisted Regression Tree Estimator
Abstract
Auxiliary information can increase the efficiency of survey estimators through an assisting model when the model captures some of the relationship between the auxiliary data and the study variables. Despite their superior properties, model-assisted estimators are rarely used in anything but their simplest form by statistical agencies to produce official statistics. This is due to the fact that the more complicated models that have been used in model-assisted estimation are often ill suited to the available auxiliary data. Under a model-assisted framework, we propose a regression tree estimator for a finite population total. Regression tree models are adept at handling the type of auxiliary data usually available in the sampling frame and provide a model that is easy to explain and justify. The estimator can be viewed as a post-stratification estimator where the post-strata are automatically selected by the recursive partitioning algorithm of the regression tree. We establish consistency of the regression tree estimator and compare its performance to other survey estimators using the US Bureau of Labor Statistics Occupational Employment Statistics Survey.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2017
- DOI:
- arXiv:
- arXiv:1712.05708
- Bibcode:
- 2017arXiv171205708M
- Keywords:
-
- Statistics - Methodology;
- 62D05
- E-Print:
- 13 pages, 3 figures