Understanding Defects in Generated Codes by Language Models

doi:10.48550/arXiv.2408.13372

Understanding Defects in Generated Codes by Language Models

This study investigates the reliability of code generation by Large Language Models (LLMs), focusing on identifying and analyzing defects in the generated code. Despite the advanced capabilities of LLMs in automating code generation, ensuring the accuracy and functionality of the output remains a significant challenge. By using a structured defect classification method to understand their nature and origins this study categorizes and analyzes 367 identified defects from code snippets generated by LLMs, with a significant proportion being functionality and algorithm errors. These error categories indicate key areas where LLMs frequently fail, underscoring the need for targeted improvements. To enhance the accuracy of code generation, this paper implemented five prompt engineering techniques, including Scratchpad Prompting, Program of Thoughts Prompting, Chain-of-Thought Prompting, Chain of Code Prompting, and Structured Chain-of-Thought Prompting. These techniques were applied to refine the input prompts, aiming to reduce ambiguities and improve the models' accuracy rate. The research findings suggest that precise and structured prompting significantly mitigates common defects, thereby increasing the reliability of LLM-generated code.

Publication:

arXiv e-prints

Pub Date:

August 2024

DOI:

10.48550/arXiv.2408.13372

arXiv:

arXiv:2408.13372

Bibcode:

2024arXiv240813372E

Keywords:

Computer Science - Software Engineering;
Computer Science - Artificial Intelligence

NASA/ADS

Understanding Defects in Generated Codes by Language Models

Abstract