Unified Perceptual Parsing for Scene Understanding

doi:10.48550/arXiv.1807.10221

Unified Perceptual Parsing for Scene Understanding

Humans recognize the visual world at multiple levels: we effortlessly categorize scenes and detect objects inside, while also identifying the textures and surfaces of the objects along with their different compositional parts. In this paper, we study a new task called Unified Perceptual Parsing, which requires the machine vision systems to recognize as many visual concepts as possible from a given image. A multi-task framework called UPerNet and a training strategy are developed to learn from heterogeneous image annotations. We benchmark our framework on Unified Perceptual Parsing and show that it is able to effectively segment a wide range of concepts from images. The trained networks are further applied to discover visual knowledge in natural scenes. Models are available at \url{https://github.com/CSAILVision/unifiedparsing}.

Publication:

arXiv e-prints

Pub Date:

July 2018

DOI:

10.48550/arXiv.1807.10221

arXiv:

arXiv:1807.10221

Bibcode:

2018arXiv180710221X

Keywords:

Computer Science - Computer Vision and Pattern Recognition

E-Print:

Accepted to European Conference on Computer Vision (ECCV) 2018

NASA/ADS

Unified Perceptual Parsing for Scene Understanding

Abstract