Towards Demystifying Representation Learning with Noncontrastive Selfsupervision
Abstract
Noncontrastive methods of selfsupervised learning (such as BYOL and SimSiam) learn representations by minimizing the distance between two views of the same image. These approaches have achieved remarkable performance in practice, but it is not well understood 1) why these methods do not collapse to the trivial solutions and 2) how the representation is learned. Tian el al. (2021) made an initial attempt on the first question and proposed DirectPred that sets the predictor directly. In our work, we analyze a generalized version of DirectPred, called DirectSet($\alpha$). We show that in a simple linear network, DirectSet($\alpha$) provably learns a desirable projection matrix and also reduces the sample complexity on downstream tasks. Our analysis suggests that weight decay acts as an implicit threshold that discard the features with high variance under augmentation, and keep the features with low variance. Inspired by our theory, we simplify DirectPred by removing the expensive eigendecomposition step. On CIFAR10, CIFAR100, STL10 and ImageNet, DirectCopy, our simpler and more computationally efficient algorithm, rivals or even outperforms DirectPred.
 Publication:

arXiv eprints
 Pub Date:
 October 2021
 arXiv:
 arXiv:2110.04947
 Bibcode:
 2021arXiv211004947W
 Keywords:

 Computer Science  Machine Learning;
 Statistics  Machine Learning