Robust 2D/3D Vehicle Parsing in CVIS

doi:10.48550/arXiv.2103.06432

Robust 2D/3D Vehicle Parsing in CVIS

We present a novel approach to robustly detect and perceive vehicles in different camera views as part of a cooperative vehicle-infrastructure system (CVIS). Our formulation is designed for arbitrary camera views and makes no assumptions about intrinsic or extrinsic parameters. First, to deal with multi-view data scarcity, we propose a part-assisted novel view synthesis algorithm for data augmentation. We train a part-based texture inpainting network in a self-supervised manner. Then we render the textured model into the background image with the target 6-DoF pose. Second, to handle various camera parameters, we present a new method that produces dense mappings between image pixels and 3D points to perform robust 2D/3D vehicle parsing. Third, we build the first CVIS dataset for benchmarking, which annotates more than 1540 images (14017 instances) from real-world traffic scenarios. We combine these novel algorithms and datasets to develop a robust approach for 2D/3D vehicle parsing for CVIS. In practice, our approach outperforms SOTA methods on 2D detection, instance segmentation, and 6-DoF pose estimation, by 4.5%, 4.3%, and 2.9%, respectively. More details and results are included in the supplement. To facilitate future research, we will release the source code and the dataset on GitHub.

Publication:

arXiv e-prints

Pub Date:

March 2021

DOI:

10.48550/arXiv.2103.06432

arXiv:

arXiv:2103.06432

Bibcode:

2021arXiv210306432M

Keywords:

Computer Science - Computer Vision and Pattern Recognition;
Computer Science - Robotics

NASA/ADS

Robust 2D/3D Vehicle Parsing in CVIS

Abstract