Description length of canonical and microcanonical models
Abstract
The (non-)equivalence of canonical and microcanonical ensembles is a central concept in statistical physics. Non-equivalence has recently been established in models with an extensive number of constraints, common, e.g., in network science, via a non-vanishing difference in relative entropy, corresponding to higher microcanonical log-likelihood per node. However, from a model selection perspective, comparing canonical and microcanonical models requires consideration of both log-likelihood and complexity. To compare both terms under the Minimum Description Length (MDL) principle, we compute the Normalized Maximum Likelihood (NML) of binary canonical and microcanonical models, finding that (i) microcanonical models, though higher in likelihood, are always more complex, making the choice of model non-trivial. (ii) The optimal model choice depends on the empirical values of the constraints: the canonical model performs best when its fit to observed data exceeds its uniform average fit across all data. (iii) Notably, in the thermodynamic limit the difference in description length per node vanishes for the equivalent models considered but persists otherwise, showing that non-equivalence implies extensive differences between large canonical and microcanonical models. Finally, we compare the NML approach to Bayesian methods, showing that (iv) the choice of priors, while practically uninfluential in equivalent models, becomes crucial when an extensive number of constraints is enforced, possibly leading to very different outcomes.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- 10.48550/arXiv.2307.05645
- arXiv:
- arXiv:2307.05645
- Bibcode:
- 2023arXiv230705645G
- Keywords:
-
- Condensed Matter - Statistical Mechanics;
- Physics - Data Analysis;
- Statistics and Probability