Using theoretical ROC curves for analysing machine learning binary classifiers
Abstract
Most binary classifiers work by processing the input to produce a scalar response and comparing it to a threshold value. The various measures of classifier performance assume, explicitly or implicitly, probability distributions $P_s$ and $P_n$ of the response belonging to either class, probability distributions for the cost of each type of misclassification, and compute a performance score from the expected cost. In machine learning, classifier responses are obtained experimentally and performance scores are computed directly from them, without any assumptions on $P_s$ and $P_n$. Here, we argue that the omitted step of estimating theoretical distributions for $P_s$ and $P_n$ can be useful. In a biometric security example, we fit beta distributions to the responses of two classifiers, one based on logistic regression and one on ANNs, and use them to establish a categorisation into a small number of classes with different extremal behaviours at the ends of the ROC curves.
 Publication:

arXiv eprints
 Pub Date:
 September 2019
 arXiv:
 arXiv:1909.09816
 Bibcode:
 2019arXiv190909816O
 Keywords:

 Computer Science  Machine Learning;
 Computer Science  Computer Vision and Pattern Recognition;
 Statistics  Machine Learning
 EPrint:
 10 pages, 4 figures