Pair-Wise Cluster Analysis
Abstract
This paper studies the problem of learning clusters which are consistently present in different (continuously valued) representations of observed data. Our setup differs slightly from the standard approach of (co-) clustering as we use the fact that some form of `labeling' becomes available in this setup: a cluster is only interesting if it has a counterpart in the alternative representation. The contribution of this paper is twofold: (i) the problem setting is explored and an analysis in terms of the PAC-Bayesian theorem is presented, (ii) a practical kernel-based algorithm is derived exploiting the inherent relation to Canonical Correlation Analysis (CCA), as well as its extension to multiple views. A content based information retrieval (CBIR) case study is presented on the multi-lingual aligned Europal document dataset which supports the above findings.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2010
- DOI:
- 10.48550/arXiv.1009.3601
- arXiv:
- arXiv:1009.3601
- Bibcode:
- 2010arXiv1009.3601H
- Keywords:
-
- Statistics - Machine Learning;
- Mathematics - Statistics Theory;
- Statistics - Applications