Establishing baselines for generative discovery of inorganic crystals
Abstract
Generative artificial intelligence offers a promising avenue for materials discovery, yet its advantages over traditional methods remain unclear. In this work, we introduce and benchmark two baseline approaches - random enumeration of charge-balanced prototypes and data-driven ion exchange of known compounds - against three generative models: a variational autoencoder, a large language model, and a diffusion model. Our results show that established methods such as ion exchange perform comparably well in generating stable materials, although many of these materials tend to closely resemble known compounds. In contrast, generative models excel at proposing novel structural frameworks and, when sufficient training data is available, can more effectively target properties such as electronic band gap and bulk modulus while maintaining a high stability rate. To enhance the performance of both the baseline and generative approaches, we implement a post-generation screening step in which all proposed structures are passed through stability and property filters from pre-trained machine learning models including universal interatomic potentials. This low-cost filtering step leads to substantial improvement in the success rates of all methods, remains computationally efficient, and ultimately provides a practical pathway toward more effective generative strategies for materials discovery.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- DOI:
- arXiv:
- arXiv:2501.02144
- Bibcode:
- 2025arXiv250102144S
- Keywords:
-
- Condensed Matter - Materials Science;
- Computer Science - Artificial Intelligence;
- Physics - Chemical Physics