AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands
Abstract
Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Then, we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. We also validate these predictions via docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2021
- DOI:
- 10.48550/arXiv.2112.13168
- arXiv:
- arXiv:2112.13168
- Bibcode:
- 2021arXiv211213168C
- Keywords:
-
- Quantitative Biology - Quantitative Methods;
- Computer Science - Machine Learning
- E-Print:
- 83 pages, 26 figures, all references moved to a single section, new results added on AI interpretability, added comparison with MolTrans, added validation using gold standard experimental data