Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy

doi:10.48550/arXiv.2310.11467

Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy

This report focuses on enhancing a binary code comment quality classification model by integrating generated code and comment pairs, to improve model accuracy. The dataset comprises 9048 pairs of code and comments written in the C programming language, each annotated as "Useful" or "Not Useful." Additionally, code and comment pairs are generated using a Large Language Model Architecture, and these generated pairs are labeled to indicate their utility. The outcome of this effort consists of two classification models: one utilizing the original dataset and another incorporating the augmented dataset with the newly generated code comment pairs and labels.

Publication:

arXiv e-prints

Pub Date:

October 2023

DOI:

10.48550/arXiv.2310.11467

arXiv:

arXiv:2310.11467

Bibcode:

2023arXiv231011467A

Keywords:

Computer Science - Software Engineering;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning

E-Print:

11 pages, 2 figures, 2 tables, Has been accepted for the Information Retrieval in Software Engineering track at Forum for Information Retrieval Evaluation 2023

NASA/ADS

Enhancing Binary Code Comment Quality Classification: Integrating Generative AI for Improved Accuracy

Abstract