Naturalistic animal behavior exhibits a strikingly complex organization in the temporal domain, whose variability stems from at least three sources: hierarchical, contextual, and stochastic. What are the neural mechanisms and computational principles generating such complex temporal features? In this review, we provide a critical assessment of the existing behavioral and neurophysiological evidence for these sources of temporal variability in naturalistic behavior. We crystallize recent studies which converge on an emergent mechanistic theory of temporal variability based on attractor neural networks and metastable dynamics, arising from the coordinated interactions between mesoscopic neural circuits. We highlight the crucial role played by structural heterogeneities and by noise arising in mesoscopic circuits. We assess the shortcomings and missing links in the current theoretical and experimental literature and propose new directions of investigations to fill these gaps.