Valid inferential models for prediction in supervised learning problems

doi:10.48550/arXiv.2112.10234

Valid inferential models for prediction in supervised learning problems

Prediction, where observed data is used to quantify uncertainty about a future observation, is a fundamental problem in statistics. Prediction sets with coverage probability guarantees are a common solution, but these do not provide probabilistic uncertainty quantification in the sense of assigning beliefs to relevant assertions about the future observable. Alternatively, we recommend the use of a {\em probabilistic predictor}, a data-dependent (imprecise) probability distribution for the to-be-predicted observation given the observed data. It is essential that the probabilistic predictor be reliable or valid, and here we offer a notion of validity and explore its behavioral and statistical implications. In particular, we show that valid probabilistic predictors must be imprecise, that they avoid sure loss, and that they lead to prediction procedures with desirable frequentist error rate control properties. We provide a general construction of a provably valid probabilistic predictor, which has close connections to the powerful conformal prediction machinery, and we illustrate this construction in regression and classification applications.

Publication:

arXiv e-prints

Pub Date:

December 2021

DOI:

10.48550/arXiv.2112.10234

arXiv:

arXiv:2112.10234

Bibcode:

2021arXiv211210234C

Keywords:

Mathematics - Statistics Theory;
Statistics - Methodology

E-Print:

29 pages, 4 figures, 2 tables. Comments welcome at https://researchers.one/articles/21.12.00002

NASA/ADS

Valid inferential models for prediction in supervised learning problems

Abstract