Evaluating the design space of diffusion-based generative models
Abstract
Most existing theoretical investigations of the accuracy of diffusion models, albeit significant, assume the score function has been approximated to a certain accuracy, and then use this a priori bound to control the error of generation. This article instead provides a first quantitative understanding of the whole generation process, i.e., both training and sampling. More precisely, it conducts a non-asymptotic convergence analysis of denoising score matching under gradient descent. In addition, a refined sampling error analysis for variance exploding models is also provided. The combination of these two results yields a full error analysis, which elucidates (again, but this time theoretically) how to design the training and sampling processes for effective generation. For instance, our theory implies a preference toward noise distribution and loss weighting in training that qualitatively agree with the ones used in [Karras et al., 2022]. It also provides perspectives on the choices of time and variance schedules in sampling: when the score is well trained, the design in [Song et al., 2021] is more preferable, but when it is less trained, the design in [Karras et al., 2022] becomes more preferable.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2024
- DOI:
- 10.48550/arXiv.2406.12839
- arXiv:
- arXiv:2406.12839
- Bibcode:
- 2024arXiv240612839W
- Keywords:
-
- Computer Science - Machine Learning;
- Mathematics - Dynamical Systems;
- Mathematics - Optimization and Control;
- Mathematics - Probability;
- Statistics - Machine Learning
- E-Print:
- Comments are welcome. Out of admiration we titled our paper after EDM, and hoped theorists' humor is not too corny