Local Glivenko-Cantelli
Abstract
If $\mu$ is a distribution over the $d$-dimensional Boolean cube $\{0,1\}^d$, our goal is to estimate its mean $p\in[0,1]^d$ based on $n$ iid draws from $\mu$. Specifically, we consider the empirical mean estimator $\hat p_n$ and study the expected maximal deviation $\Delta_n=\mathbb{E}\max_{j\in[d]}|\hat p_n(j)-p(j)|$. In the classical Universal Glivenko-Cantelli setting, one seeks distribution-free (i.e., independent of $\mu$) bounds on $\Delta_n$. This regime is well-understood: for all $\mu$, we have $\Delta_n\lesssim\sqrt{\log(d)/n}$ up to universal constants, and the bound is tight. Our present work seeks to establish dimension-free (i.e., without an explicit dependence on $d$) estimates on $\Delta_n$, including those that hold for $d=\infty$. As such bounds must necessarily depend on $\mu$, we refer to this regime as {\em local} Glivenko-Cantelli (also known as $\mu$-GC), and are aware of very few previous bounds of this type -- which are either ``abstract'' or quite sub-optimal. Already the special case of product measures $\mu$ is rather non-trivial. We give necessary and sufficient conditions on $\mu$ for $\Delta_n\to0$, and calculate sharp rates for this decay. Along the way, we discover a novel sub-gamma-type maximal inequality for shifted Bernoullis, of independent interest.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2022
- DOI:
- arXiv:
- arXiv:2209.04054
- Bibcode:
- 2022arXiv220904054C
- Keywords:
-
- Mathematics - Probability