Learning Semantic Vector Representations of Source Code via a Siamese Neural Network
Abstract
The abundance of open-source code, coupled with the success of recent advances in deep learning for natural language processing, has given rise to a promising new application of machine learning to source code. In this work, we explore the use of a Siamese recurrent neural network model on Python source code to create vectors which capture the semantics of code. We evaluate the quality of embeddings by identifying which problem from a programming competition the code solves. Our model significantly outperforms a bag-of-tokens embedding, providing promising results for improving code embeddings that can be used in future software engineering tasks.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2019
- DOI:
- 10.48550/arXiv.1904.11968
- arXiv:
- arXiv:1904.11968
- Bibcode:
- 2019arXiv190411968W
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Programming Languages;
- Computer Science - Software Engineering;
- Statistics - Machine Learning