Unified SampleOptimal Property Estimation in NearLinear Time
Abstract
We consider the fundamental learning problem of estimating properties of distributions over large domains. Using a novel piecewisepolynomial approximation technique, we derive the first unified methodology for constructing sample and timeefficient estimators for all sufficiently smooth, symmetric and nonsymmetric, additive properties. This technique yields nearlineartime computable estimators whose approximation values are asymptotically optimal and highlyconcentrated, resulting in the first: 1) estimators achieving the $\mathcal{O}(k/(\varepsilon^2\log k))$ minmax $\varepsilon$error sample complexity for all $k$symbol Lipschitz properties; 2) unified nearoptimal differentially private estimators for a variety of properties; 3) unified estimator achieving optimal bias and nearoptimal variance for five important properties; 4) nearoptimal samplecomplexity estimators for several important symmetric properties over both domain sizes and confidence levels. In addition, we establish a McDiarmid's inequality under Poisson sampling, which is of independent interest.
 Publication:

arXiv eprints
 Pub Date:
 November 2019
 arXiv:
 arXiv:1911.03105
 Bibcode:
 2019arXiv191103105H
 Keywords:

 Computer Science  Machine Learning;
 Mathematics  Statistics Theory;
 Statistics  Machine Learning
 EPrint:
 To appear at NeurIPS 2019