dsLassoCov: a federated machine learning approach incorporating covariate control
Abstract
Machine learning has been widely adopted in biomedical research, fueled by the increasing availability of data. However, integrating datasets across institutions is challenging due to legal restrictions and data governance complexities. Federated learning allows the direct, privacy preserving training of machine learning models using geographically distributed datasets, but faces the challenge of how to appropriately control for covariate effects. The naive implementation of conventional covariate control methods in federated learning scenarios is often impractical due to the substantial communication costs, particularly with high-dimensional data. To address this issue, we introduce dsLassoCov, a machine learning approach designed to control for covariate effects and allow an efficient training in federated learning. In biomedical analysis, this allow the biomarker selection against the confounding effects. Using simulated data, we demonstrate that dsLassoCov can efficiently and effectively manage confounding effects during model training. In our real-world data analysis, we replicated a large-scale Exposome analysis using data from six geographically distinct databases, achieving results consistent with previous studies. By resolving the challenge of covariate control, our proposed approach can accelerate the application of federated learning in large-scale biomedical studies.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2024
- DOI:
- arXiv:
- arXiv:2412.07991
- Bibcode:
- 2024arXiv241207991C
- Keywords:
-
- Quantitative Biology - Quantitative Methods;
- Computer Science - Machine Learning
- E-Print:
- 17 pages, 5 figures