CoverUp: Coverage-Guided LLM-Based Test Generation
Abstract
Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains a challenge. This paper proposes CoverUp, a novel approach to driving the generation of high-coverage Python regression tests. CoverUp iteratively improves test coverage, interleaving coverage analysis with dialogs with the LLM that steer it to refine tests so that they increase coverage of lines and branches. We evaluate our prototype CoverUp implementation across a benchmark of challenging code derived from open-source Python projects, and show that CoverUp substantially improves on the state of the art. Compared to CodaMosa, a hybrid search/LLM-based test generator, CoverUp achieves a per-module median line+branch coverage of 80% (vs. 47%). Compared to MuTAP, a mutation/LLM-based test generator, CoverUp achieves an overall line+branch coverage of 90% (vs. 77%). We show that CoverUp's iterative, coverage-guided approach is crucial to its effectiveness, contributing to nearly 40% of its successes.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2024
- DOI:
- 10.48550/arXiv.2403.16218
- arXiv:
- arXiv:2403.16218
- Bibcode:
- 2024arXiv240316218A
- Keywords:
-
- Computer Science - Software Engineering;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning;
- Computer Science - Programming Languages
- E-Print:
- 17 pages