Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

doi:10.48550/arXiv.2302.10160

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Wang, Kaizheng

We develop and analyze a principled approach to kernel ridge regression under covariate shift. The goal is to learn a regression function with small mean squared error over a target distribution, based on unlabeled data from there and labeled data that may have a different feature distribution. We propose to split the labeled data into two subsets, and conduct kernel ridge regression on them separately to obtain a collection of candidate models and an imputation model. We use the latter to fill the missing labels and then select the best candidate accordingly. Our non-asymptotic excess risk bounds demonstrate that our estimator adapts effectively to both the structure of the target distribution and the covariate shift. This adaptation is quantified through a notion of effective sample size that reflects the value of labeled source data for the target regression task. Our estimator achieves the minimax optimal error rate up to a polylogarithmic factor, and we find that using pseudo-labels for model selection does not significantly hinder performance.

Publication:

arXiv e-prints

Pub Date:

February 2023

DOI:

10.48550/arXiv.2302.10160

arXiv:

arXiv:2302.10160

Bibcode:

2023arXiv230210160W

Keywords:

Statistics - Methodology;
Computer Science - Machine Learning;
Mathematics - Statistics Theory;
Statistics - Machine Learning;
62J07;
62G05

E-Print:

45 pages, 2 figures

ADS

Pseudo-Labeling for Kernel Ridge Regression under Covariate Shift

Abstract