GREC: Generalized Referring Expression Comprehension
Abstract
The objective of Classic Referring Expression Comprehension (REC) is to produce a bounding box corresponding to the object mentioned in a given textual description. Commonly, existing datasets and techniques in classic REC are tailored for expressions that pertain to a single target, meaning a sole expression is linked to one specific object. Expressions that refer to multiple targets or involve no specific target have not been taken into account. This constraint hinders the practical applicability of REC. This study introduces a new benchmark termed as Generalized Referring Expression Comprehension (GREC). This benchmark extends the classic REC by permitting expressions to describe any number of target objects. To achieve this goal, we have built the first large-scale GREC dataset named gRefCOCO. This dataset encompasses a range of expressions: those referring to multiple targets, expressions with no specific target, and the single-target expressions. The design of GREC and gRefCOCO ensures smooth compatibility with classic REC. The proposed gRefCOCO dataset, a GREC method implementation code, and GREC evaluation code are available at https://github.com/henghuiding/gRefCOCO.
- Publication:
-
arXiv e-prints
- Pub Date:
- August 2023
- DOI:
- arXiv:
- arXiv:2308.16182
- Bibcode:
- 2023arXiv230816182H
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- GREC Technical Report, Project Page: https://henghuiding.github.io/GRES