In this paper, we introduce a novel single shot approach for 6D object pose estimation of rigid objects based on depth images. For this purpose, a fully convolutional neural network is employed, where the 3D input data is spatially discretized and pose estimation is considered as a regression task that is solved locally on the resulting volume elements. With 65 fps on a GPU, our Object Pose Network (OP-Net) is extremely fast, is optimized end-to-end, and estimates the 6D pose of multiple objects in the image simultaneously. Our approach does not require manually 6D pose-annotated real-world datasets and transfers to the real world, although being entirely trained on synthetic data. The proposed method is evaluated on public benchmark datasets, where we can demonstrate that state-of-the-art methods are significantly outperformed.
- Pub Date:
- April 2020
- Computer Science - Computer Vision and Pattern Recognition;
- Computer Science - Robotics;
- Electrical Engineering and Systems Science - Image and Video Processing
- Accepted at 2020 IEEE International Conference on Robotics and Automation (ICRA 2020)