Understanding Deep Contrastive Learning via Coordinate-wise Optimization

doi:10.48550/arXiv.2201.12680

Understanding Deep Contrastive Learning via Coordinate-wise Optimization

Tian, Yuandong

We show that Contrastive Learning (CL) under a broad family of loss functions (including InfoNCE) has a unified formulation of coordinate-wise optimization on the network parameter $\boldsymbol{\theta}$ and pairwise importance $\alpha$, where the \emph{max player} $\boldsymbol{\theta}$ learns representation for contrastiveness, and the \emph{min player} $\alpha$ puts more weights on pairs of distinct samples that share similar representations. The resulting formulation, called $\alpha$-CL, unifies not only various existing contrastive losses, which differ by how sample-pair importance $\alpha$ is constructed, but also is able to extrapolate to give novel contrastive losses beyond popular ones, opening a new avenue of contrastive loss design. These novel losses yield comparable (or better) performance on CIFAR10, STL-10 and CIFAR-100 than classic InfoNCE. Furthermore, we also analyze the max player in detail: we prove that with fixed $\alpha$, max player is equivalent to Principal Component Analysis (PCA) for deep linear network, and almost all local minima are global and rank-1, recovering optimal PCA solutions. Finally, we extend our analysis on max player to 2-layer ReLU networks, showing that its fixed points can have higher ranks.

Publication:

arXiv e-prints

Pub Date:

January 2022

DOI:

10.48550/arXiv.2201.12680

arXiv:

arXiv:2201.12680

Bibcode:

2022arXiv220112680T

Keywords:

Computer Science - Machine Learning;
Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Neural and Evolutionary Computing

E-Print:

Add code links

NASA/ADS

Understanding Deep Contrastive Learning via Coordinate-wise Optimization

Abstract