Can deep learning match the efficiency of human visual long-term memory in storing object details?

doi:10.48550/arXiv.2204.13061

Can deep learning match the efficiency of human visual long-term memory in storing object details?

Emin Orhan, A.

Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be a feasible strategy to reach human-level memory efficiency.

Publication:

arXiv e-prints

Pub Date:

April 2022

DOI:

10.48550/arXiv.2204.13061

arXiv:

arXiv:2204.13061

Bibcode:

2022arXiv220413061E

Keywords:

Computer Science - Machine Learning;
Computer Science - Neural and Evolutionary Computing;
Quantitative Biology - Neurons and Cognition

E-Print:

v3: mostly stylistic changes, no changes in main content

NASA/ADS

Can deep learning match the efficiency of human visual long-term memory in storing object details?

Abstract