Benefit of Interpolation in Nearest Neighbor Algorithms

doi:10.48550/arXiv.1909.11720

Benefit of Interpolation in Nearest Neighbor Algorithms

The over-parameterized models attract much attention in the era of data science and deep learning. It is empirically observed that although these models, e.g. deep neural networks, over-fit the training data, they can still achieve small testing error, and sometimes even {\em outperform} traditional algorithms which are designed to avoid over-fitting. The major goal of this work is to sharply quantify the benefit of data interpolation in the context of nearest neighbors (NN) algorithm. Specifically, we consider a class of interpolated weighting schemes and then carefully characterize their asymptotic performances. Our analysis reveals a U-shaped performance curve with respect to the level of data interpolation, and proves that a mild degree of data interpolation {\em strictly} improves the prediction accuracy and statistical stability over those of the (un-interpolated) optimal $k$NN algorithm. This theoretically justifies (predicts) the existence of the second U-shaped curve in the recently discovered double descent phenomenon. Note that our goal in this study is not to promote the use of interpolated-NN method, but to obtain theoretical insights on data interpolation inspired by the aforementioned phenomenon.

Publication:

arXiv e-prints

Pub Date:

September 2019

DOI:

10.48550/arXiv.1909.11720

arXiv:

arXiv:1909.11720

Bibcode:

2019arXiv190911720X

Keywords:

Statistics - Machine Learning;
Computer Science - Machine Learning

E-Print:

Under review as a conference paper at ICLR 2020

NASA/ADS

Benefit of Interpolation in Nearest Neighbor Algorithms

Abstract