Matrix completion with data-dependent missingness probabilities
Abstract
The problem of completing a large matrix with lots of missing entries has received widespread attention in the last couple of decades. Two popular approaches to the matrix completion problem are based on singular value thresholding and nuclear norm minimization. Most of the past works on this subject assume that there is a single number $p$ such that each entry of the matrix is available independently with probability $p$ and missing otherwise. This assumption may not be realistic for many applications. In this work, we replace it with the assumption that the probability that an entry is available is an unknown function $f$ of the entry itself. For example, if the entry is the rating given to a movie by a viewer, then it seems plausible that high value entries have greater probability of being available than low value entries. We propose two new estimators, based on singular value thresholding and nuclear norm minimization, to recover the matrix under this assumption. The estimators involve no tuning parameters, and are shown to be consistent under a low rank assumption. We also provide a consistent estimator of the unknown function $f$.
- Publication:
-
arXiv e-prints
- Pub Date:
- June 2021
- DOI:
- 10.48550/arXiv.2106.02290
- arXiv:
- arXiv:2106.02290
- Bibcode:
- 2021arXiv210602290B
- Keywords:
-
- Mathematics - Statistics Theory;
- Computer Science - Information Theory;
- Mathematics - Probability;
- Statistics - Methodology
- E-Print:
- 28 pages, 9 figures. To appear in IEEE Trans. Inf. Theory