Does Dirichlet Prior Smoothing Solve the Shannon Entropy Estimation Problem?
Abstract
The Dirichlet prior is widely used in estimating discrete distributions and functionals of discrete distributions. In terms of Shannon entropy estimation, one approach is to plug-in the Dirichlet prior smoothed distribution into the entropy functional, while the other one is to calculate the Bayes estimator for entropy under the Dirichlet prior for squared error, which is the conditional expectation. We show that in general they do \emph{not} improve over the maximum likelihood estimator, which plugs-in the empirical distribution into the entropy functional. No matter how we tune the parameters in the Dirichlet prior, this approach cannot achieve the minimax rates in entropy estimation, as recently characterized by Jiao, Venkat, Han, and Weissman, and Wu and Yang. The performance of the minimax rate-optimal estimator with $n$ samples is essentially \emph{at least} as good as that of the Dirichlet smoothed entropy estimators with $n\ln n$ samples. We harness the theory of approximation using positive linear operators for analyzing the bias of plug-in estimators for general functionals under arbitrary statistical models, thereby further consolidating the interplay between these two fields, which was thoroughly developed and exploited by Jiao, Venkat, Han, and Weissman. We establish new results in approximation theory, and apply them to analyze the bias of the Dirichlet prior smoothed plug-in entropy estimator. This interplay between bias analysis and approximation theory is of relevance and consequence far beyond the specific problem setting in this paper.
- Publication:
-
arXiv e-prints
- Pub Date:
- February 2015
- DOI:
- 10.48550/arXiv.1502.00327
- arXiv:
- arXiv:1502.00327
- Bibcode:
- 2015arXiv150200327H
- Keywords:
-
- Computer Science - Information Theory
- E-Print:
- 27 pages, 1 figure, published on IEEE Transactions on Information Theory, merged with https://arxiv.org/abs/1406.6959