From the EM Algorithm to the CM-EM Algorithm for Global Convergence of Mixture Models
Abstract
The Expectation-Maximization (EM) algorithm for mixture models often results in slow or invalid convergence. The popular convergence proof affirms that the likelihood increases with Q; Q is increasing in the M -step and non-decreasing in the E-step. The author found that (1) Q may and should decrease in some E-steps; (2) The Shannon channel from the E-step is improper and hence the expectation is improper. The author proposed the CM-EM algorithm (CM means Channel's Matching), which adds a step to optimize the mixture ratios for the proper Shannon channel and maximizes G, average log-normalized-likelihood, in the M-step. Neal and Hinton's Maximization-Maximization (MM) algorithm use F instead of Q to speed the convergence. Maximizing G is similar to maximizing F. The new convergence proof is similar to Beal's proof with the variational method. It first proves that the minimum relative entropy equals the minimum R-G (R is mutual information), then uses variational and iterative methods that Shannon et al. use for rate-distortion functions to prove the global convergence. Some examples show that Q and F should and may decrease in some E-steps. For the same example, the EM, MM, and CM-EM algorithms need about 36, 18, and 9 iterations respectively.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2018
- DOI:
- arXiv:
- arXiv:1810.11227
- Bibcode:
- 2018arXiv181011227L
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Artificial Intelligence;
- Statistics - Machine Learning;
- 68T05;
- 49N30;
- 93C41;
- 62B10;
- 68P30;
- 94A17;
- 94A34;
- H.1.1;
- H.3.3;
- G.3;
- I.1.2;
- I.2.6;
- I.4.2;
- I.5.3;
- E.4
- E-Print:
- 17 pages, 5 figures