Learning Visual Affordances with Target-Orientated Deep Q-Network to Grasp Objects by Harnessing Environmental Fixtures
Abstract
This paper introduces a challenging object grasping task and proposes a self-supervised learning approach. The goal of the task is to grasp an object which is not feasible with a single parallel gripper, but only with harnessing environment fixtures (e.g., walls, furniture, heavy objects). This Slide-to-Wall grasping task assumes no prior knowledge except the partial observation of a target object. Hence the robot should learn an effective policy given a scene observation that may include the target object, environmental fixtures, and any other disturbing objects. We formulate the problem as visual affordances learning for which Target-Oriented Deep Q-Network (TO-DQN) is proposed to efficiently learn visual affordance maps (i.e., Q-maps) to guide robot actions. Since the training necessitates robot's exploration and collision with the fixtures, TO-DQN is first trained safely with a simulated robot manipulator and then applied to a real robot. We empirically show that TO-DQN can learn to solve the task in different environment settings in simulation and outperforms a standard and a variant of Deep Q-Network (DQN) in terms of training efficiency and robustness. The testing performance in both simulation and real-robot experiments shows that the policy trained by TO-DQN achieves comparable performance to humans.
- Publication:
-
arXiv e-prints
- Pub Date:
- October 2019
- DOI:
- 10.48550/arXiv.1910.03781
- arXiv:
- arXiv:1910.03781
- Bibcode:
- 2019arXiv191003781L
- Keywords:
-
- Computer Science - Robotics;
- Computer Science - Computer Vision and Pattern Recognition
- E-Print:
- To appear on ICRA21 Xi'an