Energy Discrepancies: A Score-Independent Loss for Energy-Based Models
Abstract
Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- 10.48550/arXiv.2307.06431
- arXiv:
- arXiv:2307.06431
- Bibcode:
- 2023arXiv230706431S
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Machine Learning
- E-Print:
- Camera Ready version for the 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Changes in this revision: Appendix A1: Corrected proof of Theorem 1. Appendix D3: Added definition and numerical experiments for energy discrepancy on binary discrete spaces. Minor changes in the main text and correction of typos. Added new references