Causal Discovery on Dependent Binary Data

doi:10.48550/arXiv.2412.20289

Causal Discovery on Dependent Binary Data

The assumption of independence between observations (units) in a dataset is prevalent across various methodologies for learning causal graphical models. However, this assumption often finds itself in conflict with real-world data, posing challenges to accurate structure learning. We propose a decorrelation-based approach for causal graph learning on dependent binary data, where the local conditional distribution is defined by a latent utility model with dependent errors across units. We develop a pairwise maximum likelihood method to estimate the covariance matrix for the dependence among the units. Then, leveraging the estimated covariance matrix, we develop an EM-like iterative algorithm to generate and decorrelate samples of the latent utility variables, which serve as decorrelated data. Any standard causal discovery method can be applied on the decorrelated data to learn the underlying causal graph. We demonstrate that the proposed decorrelation approach significantly improves the accuracy in causal graph learning, through numerical experiments on both synthetic and real-world datasets.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.20289

arXiv:

arXiv:2412.20289

Bibcode:

2024arXiv241220289C

Keywords:

Computer Science - Machine Learning;
Statistics - Methodology;
Statistics - Machine Learning

ADS

Causal Discovery on Dependent Binary Data

Abstract