The Sample Complexity of Multi-Distribution Learning for VC Classes
Abstract
Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O(\epsilon^{-2} \ln(k)(d + k) + \min\{\epsilon^{-1} dk, \epsilon^{-4} \ln(k) d\})$, the best lower bound is $\Omega(\epsilon^{-2}(d + k \ln(k)))$. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2023
- DOI:
- 10.48550/arXiv.2307.12135
- arXiv:
- arXiv:2307.12135
- Bibcode:
- 2023arXiv230712135A
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- 11 pages. Authors are ordered alphabetically. Open problem presented at the 36th Annual Conference on Learning Theory