Metalearners for Estimating Heterogeneous Treatment Effects using Machine Learning
Abstract
There is growing interest in estimating and analyzing heterogeneous treatment effects in experimental and observational studies. We describe a number of metaalgorithms that can take advantage of any supervised learning or regression method in machine learning and statistics to estimate the Conditional Average Treatment Effect (CATE) function. Metaalgorithms build on base algorithmssuch as Random Forests (RF), Bayesian Additive Regression Trees (BART) or neural networksto estimate the CATE, a function that the base algorithms are not designed to estimate directly. We introduce a new metaalgorithm, the Xlearner, that is provably efficient when the number of units in one treatment group is much larger than in the other, and can exploit structural properties of the CATE function. For example, if the CATE function is linear and the response functions in treatment and control are Lipschitz continuous, the Xlearner can still achieve the parametric rate under regularity conditions. We then introduce versions of the Xlearner that use RF and BART as base learners. In extensive simulation studies, the Xlearner performs favorably, although none of the metalearners is uniformly the best. In two persuasion field experiments from political science, we demonstrate how our new Xlearner can be used to target treatment regimes and to shed light on underlying mechanisms. A software package is provided that implements our methods.
 Publication:

arXiv eprints
 Pub Date:
 June 2017
 arXiv:
 arXiv:1706.03461
 Bibcode:
 2017arXiv170603461K
 Keywords:

 Mathematics  Statistics Theory;
 Statistics  Methodology
 EPrint:
 doi:10.1073/pnas.1804597116