Geometric k-nearest neighbor estimation of entropy and mutual information
Abstract
Nonparametric estimation of mutual information is used in a wide range of scientific problems to quantify dependence between variables. The k-nearest neighbor (knn) methods are consistent, and therefore expected to work well for a large sample size. These methods use geometrically regular local volume elements. This practice allows maximum localization of the volume elements, but can also induce a bias due to a poor description of the local geometry of the underlying probability measure. We introduce a new class of knn estimators that we call geometric knn estimators (g-knn), which use more complex local volume elements to better model the local geometry of the probability measures. As an example of this class of estimators, we develop a g-knn estimator of entropy and mutual information based on elliptical volume elements, capturing the local stretching and compression common to a wide range of dynamical system attractors. A series of numerical examples in which the thickness of the underlying distribution and the sample sizes are varied suggest that local geometry is a source of problems for knn methods such as the Kraskov-Stögbauer-Grassberger estimator when local geometric effects cannot be removed by global preprocessing of the data. The g-knn method performs well despite the manipulation of the local geometry. In addition, the examples suggest that the g-knn estimators can be of particular relevance to applications in which the system is large, but the data size is limited.
- Publication:
-
Chaos
- Pub Date:
- March 2018
- DOI:
- 10.1063/1.5011683
- arXiv:
- arXiv:1711.00748
- Bibcode:
- 2018Chaos..28c3114L
- Keywords:
-
- Mathematics - Statistics Theory;
- Computer Science - Information Theory;
- Mathematics - Dynamical Systems;
- Statistics - Methodology
- E-Print:
- doi:10.1063/1.5011683