Latent Association Mining in Binary Data
Abstract
We consider the problem of identifying stable sets of mutually associated features in moderate or high-dimensional binary data. In this context we develop and investigate a method called Latent Association Mining for Binary Data (LAMB). The LAMB method is based on a simple threshold model in which the observed binary values represent a random thresholding of a latent continuous vector that may have a complex association structure. We consider a measure of latent association that quantifies association in the latent continuous vector without bias due to the random thresholding. The LAMB method uses an iterative testing based search procedure to identify stable sets of mutually associated features. We compare the LAMB method with several competing methods on artificial binary-valued datasets and two real count-valued datasets. The LAMB method detects meaningful associations in these datasets. In the case of the count-valued datasets, associations detected by the LAMB method are based only on information about whether the counts are zero or non-zero, and is competitive with methods that have access to the full count data.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2017
- DOI:
- arXiv:
- arXiv:1711.10427
- Bibcode:
- 2017arXiv171110427M
- Keywords:
-
- Statistics - Methodology
- E-Print:
- 29 pages, 2 tables, 4 figures 54 page appendix/supplemental figures