A PAC-Bayesian Analysis of Distance-Based Classifiers: Why Nearest-Neighbour works!

doi:10.48550/arXiv.2109.13889

A PAC-Bayesian Analysis of Distance-Based Classifiers: Why Nearest-Neighbour works!

Abstract We present PAC-Bayesian bounds for the generalisation error of the K-nearest-neighbour classifier (K-NN). This is achieved by casting the K-NN classifier into a kernel space framework in the limit of vanishing kernel bandwidth. We establish a relation between prior measures over the coefficients in the kernel expansion and the induced measure on the weight vectors in kernel space. Defining a sparse prior over the coefficients allows the application of a PAC-Bayesian folk theorem that leads to a generalisation bound that is a function of the number of redundant training examples: those that can be left out without changing the solution. The presented bound requires to quantify a prior belief in the sparseness of the solution and is evaluated after learning when the actual redundancy level is known. Even for small sample size (m ~ 100) the bound gives non-trivial results when both the expected sparseness and the actual redundancy are high.

Publication:

arXiv e-prints

Pub Date:

September 2021

DOI:

10.48550/arXiv.2109.13889

arXiv:

arXiv:2109.13889

Bibcode:

2021arXiv210913889G

Keywords:

Computer Science - Machine Learning;
Statistics - Machine Learning

E-Print:

This article was submitted to ICML 2000 and rejected

NASA/ADS

A PAC-Bayesian Analysis of Distance-Based Classifiers: Why Nearest-Neighbour works!

Abstract