Quantitative asymptotics of graphical projection pursuit
Abstract
There is a result of Diaconis and Freedman which says that, in a limiting sense, for large collections of highdimensional data most onedimensional projections of the data are approximately Gaussian. This paper gives quantitative versions of that result. For a set of deterministic vectors $\{x_i\}_{i=1}^n$ in $\R^d$ with $n$ and $d$ fixed, let $\theta\in\s^{d1}$ be a random point of the sphere and let $\mu_n^\theta$ denote the random measure which puts mass $\frac{1}{n}$ at each of the points $\inprod{x_1}{\theta},...,\inprod{x_n}{\theta}$. For a fixed bounded Lipschitz test function $f$, $Z$ a standard Gaussian random variable and $\sigma^2$ a suitable constant, an explicit bound is derived for the quantity $\ds¶[\int f d\mu_n^\theta\E f(\sigma Z)>\epsilon]$. A bound is also given for $\ds¶[d_{BL}(\mu_n^\theta, N(0,\sigma^2))>\epsilon]$, where $d_{BL}$ denotes the boundedLipschitz distance, which yields a lower bound on the waiting time to finding a nonGaussian projection of the $\{x_i\}$ if directions are tried independently and uniformly on $\s^{d1}$.
 Publication:

arXiv eprints
 Pub Date:
 November 2008
 arXiv:
 arXiv:0811.2769
 Bibcode:
 2008arXiv0811.2769M
 Keywords:

 Mathematics  Probability;
 Mathematics  Statistics
 EPrint:
 Proof of Theorem 2 reorganized, some additional comments on motivation, including a new corollary on waiting times to find nonGaussian directions. To appear in Elec. Comm. Probab