Learning mixtures of structured distributions over discrete domains
Abstract
Let $\mathfrak{C}$ be a class of probability distributions over the discrete domain $[n] = \{1,...,n\}.$ We show that if $\mathfrak{C}$ satisfies a rather general condition -- essentially, that each distribution in $\mathfrak{C}$ can be well-approximated by a variable-width histogram with few bins -- then there is a highly efficient (both in terms of running time and sample complexity) algorithm that can learn any mixture of $k$ unknown distributions from $\mathfrak{C}.$ We analyze several natural types of distributions over $[n]$, including log-concave, monotone hazard rate and unimodal distributions, and show that they have the required structural property of being well-approximated by a histogram with few bins. Applying our general algorithm, we obtain near-optimally efficient algorithms for all these mixture learning problems.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2012
- DOI:
- 10.48550/arXiv.1210.0864
- arXiv:
- arXiv:1210.0864
- Bibcode:
- 2012arXiv1210.0864C
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Data Structures and Algorithms;
- Mathematics - Statistics Theory
- E-Print:
- preliminary full version of soda'13 paper