Can deep learning match the efficiency of human visual long-term memory in storing object details?
Abstract
Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be a feasible strategy to reach human-level memory efficiency.
- Publication:
-
arXiv e-prints
- Pub Date:
- April 2022
- DOI:
- 10.48550/arXiv.2204.13061
- arXiv:
- arXiv:2204.13061
- Bibcode:
- 2022arXiv220413061E
- Keywords:
-
- Computer Science - Machine Learning;
- Computer Science - Neural and Evolutionary Computing;
- Quantitative Biology - Neurons and Cognition
- E-Print:
- v3: mostly stylistic changes, no changes in main content