Dirichlet Process with Mixed Random Measures: A Nonparametric Topic Model for Labeled Data
Abstract
We describe a nonparametric topic model for labeled data. The model uses a mixture of random measures (MRM) as a base distribution of the Dirichlet process (DP) of the HDP framework, so we call it the DP-MRM. To model labeled data, we define a DP distributed random measure for each label, and the resulting model generates an unbounded number of topics for each label. We apply DP-MRM on single-labeled and multi-labeled corpora of documents and compare the performance on label prediction with MedLDA, LDA-SVM, and Labeled-LDA. We further enhance the model by incorporating ddCRP and modeling multi-labeled images for image segmentation and object labeling, comparing the performance with nCuts and rddCRP.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2012
- DOI:
- 10.48550/arXiv.1206.4658
- arXiv:
- arXiv:1206.4658
- Bibcode:
- 2012arXiv1206.4658K
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- ICML2012