Discrete Infomax Codes for Supervised Representation Learning
Abstract
Learning compact discrete representations of data is a key task on its own or for facilitating subsequent processing of data. In this paper we present a model that produces Discrete InfoMax Codes (DIMCO); we learn a probabilistic encoder that yields k-way d-dimensional codes associated with input data. Our model's learning objective is to maximize the mutual information between codes and labels with a regularization, which enforces entries of a codeword to be as independent as possible. We show that the infomax principle also justifies previous loss functions (e.g., cross-entropy) as its special cases. Our analysis also shows that using shorter codes, as DIMCO does, reduces overfitting in the context of few-shot classification. Through experiments in various domains, we observe this implicit meta-regularization effect of DIMCO. Furthermore, we show that the codes learned by DIMCO are efficient in terms of both memory and retrieval time compared to previous methods.
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2019
- DOI:
- 10.48550/arXiv.1905.11656
- arXiv:
- arXiv:1905.11656
- Bibcode:
- 2019arXiv190511656L
- Keywords:
-
- Statistics - Machine Learning;
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning
- E-Print:
- 19 pages