Filling the gaps: Gaussian mixture models from noisy, truncated or incomplete samples
Abstract
Astronomical data often suffer from noise and incompleteness. We extend the common mixturesofGaussians density estimation approach to account for situations with a known sample incompleteness by simultaneous imputation from the current model. The method, called GMMis, generalizes existing ExpectationMaximization techniques for truncated data to arbitrary truncation geometries and probabilistic rejection processes, as long as they can be specified and do not depend on the density itself. The method accounts for independent multivariate normal measurement errors for each of the observed samples and recovers an estimate of the errorfree distribution from which both observed and unobserved samples are drawn. It can perform a separation of a mixturesofGaussian signal from a specified background distribution whose amplitude may be unknown. We compare GMMis to the standard Gaussian mixture model for simple test cases with different types of incompleteness, and apply it to observational data from the NASA Chandra Xray telescope. The PYTHON code is released as an opensource package at https://github.com/pmelchior/pyGMMis.
 Publication:

Astronomy and Computing
 Pub Date:
 October 2018
 DOI:
 10.1016/j.ascom.2018.09.013
 arXiv:
 arXiv:1611.05806
 Bibcode:
 2018A&C....25..183M
 Keywords:

 Density estimation;
 Multivariate Gaussian mixture model;
 Truncated data;
 Missing at random;
 Astrophysics  Instrumentation and Methods for Astrophysics;
 Astrophysics  High Energy Astrophysical Phenomena;
 Physics  Data Analysis;
 Statistics and Probability;
 Statistics  Methodology
 EPrint:
 13 pages, 5 figures, postpublication extension of section 2.3