The ODE Method for Asymptotic Statistics in Stochastic Approximation and Reinforcement Learning
Abstract
The paper concerns the stochastic approximation recursion, \[ \theta_{n+1}= \theta_n + \alpha_{n + 1} f(\theta_n, \Phi_{n+1}) \,,\quad n\ge 0, \] where the {\em estimates} $\theta_n\in\Re^d$ and $ \{ \Phi_n \}$ is a Markov chain on a general state space. In addition to standard Lipschitz assumptions and conditions on the vanishing stepsize sequence, it is assumed that the associated \textit{mean flow} $ \tfrac{d}{dt} \vartheta_t = \bar{f}(\vartheta_t)$, is globally asymptotically stable with stationary point denoted $\theta^*$, where $\bar{f}(\theta)=\text{ E}[f(\theta,\Phi)]$ with $\Phi$ having the stationary distribution of the chain. The main results are established under additional conditions on the mean flow and a version of the DonskerVaradhan Lyapunov drift condition known as (DV3) for the chain: (i) An appropriate Lyapunov function is constructed that implies convergence of the estimates in $L_4$. (ii) A functional CLT is established, as well as the usual onedimensional CLT for the normalized error. Moment bounds combined with the CLT imply convergence of the normalized covariance $\text{ E} [ z_n z_n^T ]$ to the asymptotic covariance $\Sigma^\Theta$ in the CLT, where $z_n= (\theta_n\theta^*)/\sqrt{\alpha_n}$. (iii) The CLT holds for the normalized version $z^{\text{ PR}}_n$ of the averaged parameters $\theta^{\text{ PR}}_n$, subject to standard assumptions on the stepsize. Moreover, the normalized covariance of both $\theta^{\text{ PR}}_n$ and $z^{\text{ PR}}_n$ converge to $\Sigma^{\text{ PR}}$, the minimal covariance of Polyak and Ruppert. (iv)} An example is given where $f$ and $\bar{f}$ are linear in $\theta$, and the Markov chain is geometrically ergodic but does not satisfy (DV3). While the algorithm is convergent, the second moment of $\theta_n$ is unbounded and in fact diverges.
 Publication:

arXiv eprints
 Pub Date:
 October 2021
 DOI:
 10.48550/arXiv.2110.14427
 arXiv:
 arXiv:2110.14427
 Bibcode:
 2021arXiv211014427B
 Keywords:

 Mathematics  Statistics Theory;
 Computer Science  Machine Learning;
 62L20;
 60F17;
 68T05
 EPrint:
 2 figures