Shake-Shake regularization
Abstract
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://github.com/xgastaldi/shake-shake
- Publication:
-
arXiv e-prints
- Pub Date:
- May 2017
- DOI:
- 10.48550/arXiv.1705.07485
- arXiv:
- arXiv:1705.07485
- Bibcode:
- 2017arXiv170507485G
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Computer Vision and Pattern Recognition