Discovering general multidimensional associations
Abstract
When two variables are related by a known function, the coefficient of determination (denoted $R^2$) measures the proportion of the total variance in the observations that is explained by that function. This quantifies the strength of the relationship between variables by describing what proportion of the variance is signal as opposed to noise. For linear relationships, this is equal to the square of the correlation coefficient, $\rho$. When the parametric form of the relationship is unknown, however, it is unclear how to estimate the proportion of explained variance equitably  assigning similar values to equally noisy relationships. Here we demonstrate how to directly estimate a generalized $R^2$ when the form of the relationship is unknown, and we question the performance of the Maximal Information Coefficient (MIC)  a recently proposed information theoretic measure of dependence. We show that our approach behaves equitably, has more power than MIC to detect association between variables, and converges faster with increasing sample size. Most importantly, our approach generalizes to higher dimensions, which allows us to estimate the strength of multivariate relationships ($Y$ against $A,B, ...$) and to measure association while controlling for covariates ($Y$ against $X$ controlling for $C$).
 Publication:

arXiv eprints
 Pub Date:
 March 2013
 arXiv:
 arXiv:1303.1828
 Bibcode:
 2013arXiv1303.1828M
 Keywords:

 Statistics  Applications
 EPrint:
 8 pages. 4 figures. Supporting information can be found at http://www.cs.sun.ac.za/~bmurrell/Murrell_Matie_SI.pdf