A nominal association matrix with feature selection for categorical data
Abstract
We introduce an informative probabilistic association matrix to measure a proportional local-to-global association of categories of one variable with another categorical variable. Towards a probability based proportional prediction, the association matrix gives rise to the expected predictive distribution of the first and second types of errors for a multinomial response variable. In addition, the normalization of the diagonal of the matrix gives rise to an association vector, which provides the expected category accuracy lift rate distribution. A general scheme of global-to-global association measures with flexible weight vectors is further developed from the matrix. A hierarchy of equivalence relations defined by the association matrix and vector is shown. Applications to financial and survey data together with simulations results are presented.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2013
- DOI:
- 10.48550/arXiv.1307.7841
- arXiv:
- arXiv:1307.7841
- Bibcode:
- 2013arXiv1307.7841H
- Keywords:
-
- Statistics - Methodology;
- 62H20;
- 62F07;
- 68T30
- E-Print:
- 24 pages, 1 figure. arXiv admin note: substantial text overlap with arXiv:1109.2553