An efficient framework for zero-shot sketch-based image retrieval

doi:10.1016/j.patcog.2022.108528

An efficient framework for zero-shot sketch-based image retrieval

Zero-shot sketch-based image retrieval (ZS-SBIR) has recently attracted the attention of the computer vision community due to its real-world applications, and the more realistic and challenging setting that it presents over SBIR. ZS-SBIR inherits the main challenges of multiple computer vision problems including content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation. The majority of previous studies using deep neural networks have achieved improved results by either projecting sketch and images into a common low-dimensional space, or transferring knowledge from seen to unseen classes. However, those approaches are trained with complex frameworks composed of multiple deep convolutional neural networks (CNNs) and are dependent on category-level word labels. This increases the requirements for training resources and datasets. In comparison, we propose a simple and efficient framework that does not require high computational training resources, and learns the semantic embedding space from a vision model rather than a language model, as is done by related studies. Furthermore, at training and inference stages our method only uses a single CNN. In this work, a pre-trained ImageNet CNN (i.e., ResNet50) is fine-tuned with three proposed learning objects: domain-balanced quadruplet loss, semantic classification loss, and semantic knowledge preservation loss. The domain-balanced quadruplet and semantic classification losses are introduced to learn discriminative, semantic and domain invariant features by considering ZS-SBIR as an object detection and verification problem. To preserve semantic knowledge learned with ImageNet and exploit it for unseen categories, the semantic knowledge preservation loss is proposed. To reduce computational cost and increase the accuracy of the semantic knowledge distillation process, ground-truth semantic knowledge is prepared in a class-oriented fashion prior to training. Extensive experiments are conducted on three challenging ZS-SBIR datasets: Sketchy Extended, TU-Berlin Extended and QuickDraw Extended. The proposed method achieves state-of-the-art results, and outperforms the majority of related works by a substantial margin.

Publication:

Pattern Recognition

Pub Date:

June 2022

DOI:

10.1016/j.patcog.2022.108528

arXiv:

arXiv:2102.04016

Bibcode:

2022PatRe.12608528T

Keywords:

Sketch-based image retrieval;
Zero-shot learning;
Knowledge distillation;
Similarity learning;
Computer Science - Computer Vision and Pattern Recognition

NASA/ADS

An efficient framework for zero-shot sketch-based image retrieval

Abstract