K-ANMI: A Mutual Information Based Clustering Algorithm for Categorical Data
Abstract
Clustering categorical data is an integral part of data mining and has attracted much attention recently. In this paper, we present k-ANMI, a new efficient algorithm for clustering categorical data. The k-ANMI algorithm works in a way that is similar to the popular k-means algorithm, and the goodness of clustering in each step is evaluated using a mutual information based criterion (namely, Average Normalized Mutual Information-ANMI) borrowed from cluster ensemble. Experimental results on real datasets show that k-ANMI algorithm is competitive with those state-of-art categorical data clustering algorithms with respect to clustering accuracy.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2005
- DOI:
- 10.48550/arXiv.cs/0511013
- arXiv:
- arXiv:cs/0511013
- Bibcode:
- 2005cs.......11013H
- Keywords:
-
- Computer Science - Artificial Intelligence;
- Computer Science - Databases
- E-Print:
- 18 pages