We introduce three new robustness benchmarks consisting of naturally occurring distribution changes in image style, geographic location, camera operation, and more. Using our benchmarks, we take stock of previously proposed hypotheses for out-of-distribution robustness and put them to the test. We find that using larger models and synthetic data augmentation can improve robustness on real-world distribution shifts, contrary to claims in prior work. Motivated by this, we introduce a new data augmentation method which advances the state-of-the-art and outperforms models pretrained with 1000x more labeled data. We find that some methods consistently help with distribution shifts in texture and local image statistics, but these methods do not help with some other distribution shifts like geographic changes. Hence no evaluated method consistently improves robustness. We conclude that future research must study multiple distribution shifts simultaneously.
- Pub Date:
- June 2020
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Machine Learning;
- Statistics - Machine Learning
- Datasets, code, and models available at https://github.com/hendrycks/imagenet-r