Universal Object Detection with Large Vision Model

doi:10.48550/arXiv.2212.09408

Universal Object Detection with Large Vision Model

Over the past few years, there has been growing interest in developing a broad, universal, and general-purpose computer vision system. Such systems have the potential to address a wide range of vision tasks simultaneously, without being limited to specific problems or data domains. This universality is crucial for practical, real-world computer vision applications. In this study, our focus is on a specific challenge: the large-scale, multi-domain universal object detection problem, which contributes to the broader goal of achieving a universal vision system. This problem presents several intricate challenges, including cross-dataset category label duplication, label conflicts, and the necessity to handle hierarchical taxonomies. To address these challenges, we introduce our approach to label handling, hierarchy-aware loss design, and resource-efficient model training utilizing a pre-trained large vision model. Our method has demonstrated remarkable performance, securing a prestigious second-place ranking in the object detection track of the Robust Vision Challenge 2022 (RVC 2022) on a million-scale cross-dataset object detection benchmark. We believe that our comprehensive study will serve as a valuable reference and offer an alternative approach for addressing similar challenges within the computer vision community. The source code for our work is openly available at https://github.com/linfeng93/Large-UniDet.

Publication:

arXiv e-prints

Pub Date:

December 2022

DOI:

10.48550/arXiv.2212.09408

arXiv:

arXiv:2212.09408

Bibcode:

2022arXiv221209408L

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

Accepted by International Journal of Computer Vision (IJCV). The 2nd place in the object detection track of the Robust Vision Challenge (RVC 2022)

NASA/ADS

Universal Object Detection with Large Vision Model

Abstract