On the definition of toxicity in NLP
Abstract
The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out. This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2023
- DOI:
- 10.48550/arXiv.2310.02357
- arXiv:
- arXiv:2310.02357
- Bibcode:
- 2023arXiv231002357B
- Keywords:
-
- Computer Science - Computation and Language;
- Computer Science - Artificial Intelligence;
- Computer Science - Machine Learning