Simultaneous Object Reconstruction and Grasp Prediction using a Camera-centric Object Shell Representation
Abstract
Being able to grasp objects is a fundamental component of most robotic manipulation systems. In this paper, we present a new approach to simultaneously reconstruct a mesh and a dense grasp quality map of an object from a depth image. At the core of our approach is a novel camera-centric object representation called the "object shell" which is composed of an observed "entry image" and a predicted "exit image". We present an image-to-image residual ConvNet architecture in which the object shell and a grasp-quality map are predicted as separate output channels. The main advantage of the shell representation and the corresponding neural network architecture, ShellGrasp-Net, is that the input-output pixel correspondences in the shell representation are explicitly represented in the architecture. We show that this coupling yields superior generalization capabilities for object reconstruction and accurate grasp quality estimation implicitly considering the object geometry. Our approach yields an efficient dense grasp quality map and an object geometry estimate in a single forward pass. Both of these outputs can be used in a wide range of robotic manipulation applications. With rigorous experimental validation, both in simulation and on a real setup, we show that our shell-based method can be used to generate precise grasps and the associated grasp quality with over 90% accuracy. Diverse grasps computed on shell reconstructions allow the robot to select and execute grasps in cluttered scenes with more than 93% success rate.
- Publication:
-
arXiv e-prints
- Pub Date:
- September 2021
- DOI:
- 10.48550/arXiv.2109.06837
- arXiv:
- arXiv:2109.06837
- Bibcode:
- 2021arXiv210906837C
- Keywords:
-
- Computer Science - Robotics
- E-Print:
- 18 pages, 12 figures, 8 tables