Classification and regression tree methods for incomplete data from sample surveys
Abstract
Analysis of sample survey data often requires adjustments to account for missing data in the outcome variables of principal interest. Standard adjustment methods based on item imputation or on propensity weighting factors rely heavily on the availability of auxiliary variables for both responding and non-responding units. Application of these adjustment methods can be especially challenging in cases for which the auxiliary variables are numerous and are themselves subject to substantial incomplete-data problems. This paper shows how classification and regression trees and forests can overcome some of the computational difficulties. An in-depth simulation study based on incomplete-data patterns encountered in the U.S. Consumer Expenditure Survey is used to compare the methods with two standard methods for estimating a population mean in terms of bias, mean squared error, computational speed and number of variables that can be analyzed.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2016
- DOI:
- arXiv:
- arXiv:1603.01631
- Bibcode:
- 2016arXiv160301631L
- Keywords:
-
- Statistics - Methodology