Double Machine Learning for Partially Linear Mixed-Effects Models with Repeated Measurements
Abstract
Traditionally, spline or kernel approaches in combination with parametric estimation are used to infer the linear coefficient (fixed effects) in a partially linear mixed-effects model for repeated measurements. Using machine learning algorithms allows us to incorporate complex interaction structures and high-dimensional variables. We employ double machine learning to cope with the nonparametric part of the partially linear mixed-effects model: the nonlinear variables are regressed out nonparametrically from both the linear variables and the response. This adjustment can be performed with any machine learning algorithm, for instance random forests, which allows to take complex interaction terms and nonsmooth structures into account. The adjusted variables satisfy a linear mixed-effects model, where the linear coefficient can be estimated with standard linear mixed-effects techniques. We prove that the estimated fixed effects coefficient converges at the parametric rate, is asymptotically Gaussian distributed, and semiparametrically efficient. Two simulation studies demonstrate that our method outperforms a penalized regression spline approach in terms of coverage. We also illustrate our proposed approach on a longitudinal dataset with HIV-infected individuals. Software code for our method is available in the R-package dmlalg.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2021
- DOI:
- 10.48550/arXiv.2108.13657
- arXiv:
- arXiv:2108.13657
- Bibcode:
- 2021arXiv210813657E
- Keywords:
-
- Statistics - Methodology;
- Mathematics - Statistics Theory;
- Statistics - Machine Learning
- E-Print:
- doi:10.1111/sjos.12639