Improving Requirements Classification with SMOTE-Tomek Preprocessing
Abstract
This study emphasizes the domain of requirements engineering by applying the SMOTE-Tomek preprocessing technique, combined with stratified K-fold cross-validation, to address class imbalance in the PROMISE dataset. This dataset comprises 969 categorized requirements, classified into functional and non-functional types. The proposed approach enhances the representation of minority classes while maintaining the integrity of validation folds, leading to a notable improvement in classification accuracy. Logistic regression achieved 76.16\%, significantly surpassing the baseline of 58.31\%. These results highlight the applicability and efficiency of machine learning models as scalable and interpretable solutions.
- Publication:
-
arXiv e-prints
- Pub Date:
- January 2025
- DOI:
- arXiv:
- arXiv:2501.06491
- Bibcode:
- 2025arXiv250106491O
- Keywords:
-
- Computer Science - Software Engineering;
- Computer Science - Artificial Intelligence;
- Electrical Engineering and Systems Science - Systems and Control
- E-Print:
- 8 pages, 5 figures