Unified Perceptual Parsing for Scene Understanding
Abstract
Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes. Models are available at \url{https://github.com/CSAILVision/unifiedparsing}.
- Publication:
-
arXiv e-prints
- Pub Date:
- July 2018
- DOI:
- 10.48550/arXiv.1807.10221
- arXiv:
- arXiv:1807.10221
- Bibcode:
- 2018arXiv180710221X
- Keywords:
-
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- Accepted to European Conference on Computer Vision (ECCV) 2018