An analytic theory of creativity in convolutional diffusion models

doi:10.48550/arXiv.2412.20292

An analytic theory of creativity in convolutional diffusion models

We obtain the first analytic, interpretable and predictive theory of creativity in convolutional diffusion models. Indeed, score-based diffusion models can generate highly creative images that lie far from their training data. But optimal score-matching theory suggests that these models should only be able to produce memorized training examples. To reconcile this theory-experiment gap, we identify two simple inductive biases, locality and equivariance, that: (1) induce a form of combinatorial creativity by preventing optimal score-matching; (2) result in a fully analytic, completely mechanistically interpretable, equivariant local score (ELS) machine that, (3) without any training can quantitatively predict the outputs of trained convolution only diffusion models (like ResNets and UNets) with high accuracy (median $r^2$ of $0.90, 0.91, 0.94$ on CIFAR10, FashionMNIST, and MNIST). Our ELS machine reveals a locally consistent patch mosaic model of creativity, in which diffusion models create exponentially many novel images by mixing and matching different local training set patches in different image locations. Our theory also partially predicts the outputs of pre-trained self-attention enabled UNets (median $r^2 \sim 0.75$ on CIFAR10), revealing an intriguing role for attention in carving out semantic coherence from local patch mosaics.

Publication:

arXiv e-prints

Pub Date:

December 2024

DOI:

10.48550/arXiv.2412.20292

arXiv:

arXiv:2412.20292

Bibcode:

2024arXiv241220292K

Keywords:

Computer Science - Machine Learning;
Condensed Matter - Disordered Systems and Neural Networks;
Computer Science - Artificial Intelligence;
Quantitative Biology - Neurons and Cognition;
Statistics - Machine Learning;
I.2.10

ADS

An analytic theory of creativity in convolutional diffusion models

Abstract