Segmenting Object Affordances: Reproducibility and Sensitivity to Scale
Abstract
Visual affordance segmentation identifies image regions of an object an agent can interact with. Existing methods re-use and adapt learning-based architectures for semantic segmentation to the affordance segmentation task and evaluate on small-size datasets. However, experimental setups are often not reproducible, thus leading to unfair and inconsistent comparisons. In this work, we benchmark these methods under a reproducible setup on two single objects scenarios, tabletop without occlusions and hand-held containers, to facilitate future comparisons. We include a version of a recent architecture, Mask2Former, re-trained for affordance segmentation and show that this model is the best-performing on most testing sets of both scenarios. Our analysis shows that models are not robust to scale variations when object resolutions differ from those in the training set.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2024
- DOI:
- arXiv:
- arXiv:2409.01814
- Bibcode:
- 2024arXiv240901814A
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- Paper accepted to Workshop on Assistive Computer Vision and Robotics (ACVR) in European Conference on Computer Vision (ECCV) 2024