What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes
Abstract
The F-measure or F-score is one of the most commonly used single number measures in Information Retrieval, Natural Language Processing and Machine Learning, but it is based on a mistake, and the flawed assumptions render it unsuitable for use in most contexts! Fortunately, there are better alternatives.
- Publication:
-
arXiv e-prints
- Pub Date:
- March 2015
- DOI:
- 10.48550/arXiv.1503.06410
- arXiv:
- arXiv:1503.06410
- Bibcode:
- 2015arXiv150306410P
- Keywords:
-
- Computer Science - Information Retrieval;
- Computer Science - Computation and Language;
- Computer Science - Machine Learning;
- Computer Science - Neural and Evolutionary Computing;
- Statistics - Computation;
- Statistics - Machine Learning;
- 68T05;
- 68Q32;
- 91E45;
- D.2.8;
- I.2.6;
- I.2.7;
- I.4.6;
- I.5.1
- E-Print:
- 19 pages