Inputs from Hell: Generating Uncommon Inputs from Common Samples
Abstract
Generating structured input files to test programs can be performed by techniques that produce them from a grammar that serves as the specification for syntactically correct input files. Two interesting scenarios then arise for effective testing. In the first scenario, software engineers would like to generate inputs that are as similar as possible to the inputs in common usage of the program, to test the reliability of the program. More interesting is the second scenario where inputs should be as dissimilar as possible from normal usage. This is useful for robustness testing and exploring yet uncovered behavior. To provide test cases for both scenarios, we leverage a context-free grammar to parse a set of sample input files that represent the program's common usage, and determine probabilities for individual grammar production as they occur during parsing the inputs. Replicating these probabilities during grammar-based test input generation, we obtain inputs that are close to the samples. Inverting these probabilities yields inputs that are strongly dissimilar to common inputs, yet still valid with respect to the grammar. Our evaluation on three common input formats (JSON, JavaScript, CSS) shows the effectiveness of these approaches in obtaining instances from both sets of inputs.
- Publication:
-
arXiv e-prints
- Pub Date:
- December 2018
- DOI:
- 10.48550/arXiv.1812.07525
- arXiv:
- arXiv:1812.07525
- Bibcode:
- 2018arXiv181207525P
- Keywords:
-
- Computer Science - Software Engineering