Retrieval-Based Neural Code Generation
Abstract
In models to generate program source code from natural language, representing this code in a tree structure has been a common approach. However, existing methods often fail to generate complex code correctly due to a lack of ability to memorize large and complex structures. We introduce ReCode, a method based on subtree retrieval that makes it possible to explicitly reference existing code examples within a neural code generation model. First, we retrieve sentences that are similar to input sentences using a dynamic-programming-based sentence similarity scoring method. Next, we extract n-grams of action sequences that build the associated abstract syntax tree. Finally, we increase the probability of actions that cause the retrieved n-gram action subtree to be in the predicted code. We show that our approach improves the performance on two code generation tasks by up to +2.6 BLEU.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2018
- DOI:
- 10.48550/arXiv.1808.10025
- arXiv:
- arXiv:1808.10025
- Bibcode:
- 2018arXiv180810025A
- Keywords:
-
- Computer Science - Computation and Language
- E-Print:
- This paper is accepted in EMNLP 2018. It has 6 pages