Quantizing Multiple Sources to a Common Cluster Center: An Asymptotic Analysis
Abstract
We consider quantizing an $Ld$-dimensional sample, which is obtained by concatenating $L$ vectors from datasets of $d$-dimensional vectors, to a $d$-dimensional cluster center. The distortion measure is the weighted sum of $r$th powers of the distances between the cluster center and the samples. For $L=1$, one recovers the ordinary center based clustering formulation. The general case $L>1$ appears when one wishes to cluster a dataset through $L$ noisy observations of each of its members. We find a formula for the average distortion performance in the asymptotic regime where the number of cluster centers are large. We also provide an algorithm to numerically optimize the cluster centers and verify our analytical results on real and artificial datasets. In terms of faithfulness to the original (noiseless) dataset, our clustering approach outperforms the naive approach that relies on quantizing the $Ld$-dimensional noisy observation vectors to $Ld$-dimensional centers.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2020
- DOI:
- 10.48550/arXiv.2010.12546
- arXiv:
- arXiv:2010.12546
- Bibcode:
- 2020arXiv201012546K
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Information Theory;
- Statistics - Machine Learning