Can the Query-based Object Detector Be Designed with Fewer Stages?
Abstract
Query-based object detectors have made significant advancements since the publication of DETR. However, most existing methods still rely on multi-stage encoders and decoders, or a combination of both. Despite achieving high accuracy, the multi-stage paradigm (typically consisting of 6 stages) suffers from issues such as heavy computational burden, prompting us to reconsider its necessity. In this paper, we explore multiple techniques to enhance query-based detectors and, based on these findings, propose a novel model called GOLO (Global Once and Local Once), which follows a two-stage decoding paradigm. Compared to other mainstream query-based models with multi-stage decoders, our model employs fewer decoder stages while still achieving considerable performance. Experimental results on the COCO dataset demonstrate the effectiveness of our approach.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2023
- DOI:
- 10.48550/arXiv.2309.16306
- arXiv:
- arXiv:2309.16306
- Bibcode:
- 2023arXiv230916306L
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition