RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching
Abstract
RNA plays a crucial role in diverse life processes. In contrast to the rapid advancement of protein design methods, the work related to RNA is more demanding. Most current RNA design approaches concentrate on specified target attributes and rely on extensive experimental searches. However, these methods remain costly and inefficient due to practical limitations. In this paper, we characterize all sequence design issues as conditional generation tasks and offer parameterized representations for multiple problems. For these problems, we have developed a universal RNA sequence generation model based on flow matching, namely RNACG. RNACG can accommodate various conditional inputs and is portable, enabling users to customize the encoding network for conditional inputs as per their requirements and integrate it into the generation network. We evaluated RNACG in RNA 3D structure inverse folding, 2D structure inverse folding, family-specific sequence generation, and 5'UTR translation efficiency prediction. RNACG attains superior or competitive performance on these tasks compared with other methods. RNACG exhibits extensive applicability in sequence generation and property prediction tasks, providing a novel approach to RNA sequence design and potential methods for simulation experiments with large-scale RNA sequence data.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2024
- DOI:
- 10.48550/arXiv.2407.19838
- arXiv:
- arXiv:2407.19838
- Bibcode:
- 2024arXiv240719838G
- Keywords:
-
- Quantitative Biology - Biomolecules;
- Computer Science - Machine Learning