CoverUp: Coverage-Guided LLM-Based Test Generation

doi:10.48550/arXiv.2403.16218

CoverUp: Coverage-Guided LLM-Based Test Generation

Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains a challenge. This paper proposes CoverUp, a novel approach to driving the generation of high-coverage Python regression tests. CoverUp iteratively improves test coverage, interleaving coverage analysis with dialogs with the LLM that steer it to refine tests so that they increase coverage of lines and branches. We evaluate our prototype CoverUp implementation across a benchmark of challenging code derived from open-source Python projects, and show that CoverUp substantially improves on the state of the art. Compared to CodaMosa, a hybrid search/LLM-based test generator, CoverUp achieves a per-module median line+branch coverage of 80% (vs. 47%). Compared to MuTAP, a mutation/LLM-based test generator, CoverUp achieves an overall line+branch coverage of 90% (vs. 77%). We show that CoverUp's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly 40% of its successes.

Publication:

arXiv e-prints

Pub Date:

March 2024

DOI:

10.48550/arXiv.2403.16218

arXiv:

arXiv:2403.16218

Bibcode:

2024arXiv240316218A

Keywords:

Computer Science - Software Engineering;
Computer Science - Artificial Intelligence;
Computer Science - Machine Learning;
Computer Science - Programming Languages

E-Print:

17 pages

NASA/ADS

CoverUp: Coverage-Guided LLM-Based Test Generation

Abstract