mixup: Beyond Empirical Risk Minimization
Abstract
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2017
- DOI:
- 10.48550/arXiv.1710.09412
- arXiv:
- arXiv:1710.09412
- Bibcode:
- 2017arXiv171009412Z
- Keywords:
-
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- E-Print:
- ICLR camera ready version. Changes vs V1: fix repo URL