LaGDif: Latent Graph Diffusion Model for Efficient Protein Inverse Folding with Self-Ensemble
Abstract
Protein inverse folding aims to identify viable amino acid sequences that can fold into given protein structures, enabling the design of novel proteins with desired functions for applications in drug discovery, enzyme engineering, and biomaterial development. Diffusion probabilistic models have emerged as a promising approach in inverse folding, offering both feasible and diverse solutions compared to traditional energy-based methods and more recent protein language models. However, existing diffusion models for protein inverse folding operate in discrete data spaces, necessitating prior distributions for transition matrices and limiting smooth transitions and gradients inherent to continuous spaces, leading to suboptimal performance. Drawing inspiration from the success of diffusion models in continuous domains, we introduce the Latent Graph Diffusion Model for Protein Inverse Folding (LaGDif). LaGDif bridges discrete and continuous realms through an encoder-decoder architecture, transforming protein graph data distributions into random noise within a continuous latent space. Our model then reconstructs protein sequences by considering spatial configurations, biochemical attributes, and environmental factors of each node. Additionally, we propose a novel inverse folding self-ensemble method that stabilizes prediction results and further enhances performance by aggregating multiple denoised output protein sequence. Empirical results on the CATH dataset demonstrate that LaGDif outperforms existing state-of-the-art techniques, achieving up to 45.55% improvement in sequence recovery rate for single-chain proteins and maintaining an average RMSD of 1.96 Å between generated and native structures. The code is public available at https://github.com/TaoyuW/LaGDif.
- Publication:
-
arXiv e-prints
- Pub Date:
- November 2024
- DOI:
- arXiv:
- arXiv:2411.01737
- Bibcode:
- 2024arXiv241101737W
- Keywords:
-
- Quantitative Biology - Quantitative Methods