Goal-Auxiliary Actor-Critic for 6D Robotic Grasping with Point Clouds

Robotic grasping is a challenging task, especially in the cases when top-down bin-picking is insufficient.

Robotic grasping is a challenging task, especially in the cases when top-down bin-picking is insufficient. Complex 6D grasping includes 3D translation and 3D rotation of the robot gripper like in the task of grasping a cereal box on a tabletop. A recent paper on arXiv.org proposes a novel method for learning 6D grasping policies from point clouds of objects.

Image credit: Richard Greenhill and Hugo Elias/Wikipedia/CC BY-SA 3.0

It combines imitation learning with a planner and reinforcement learning for known objects. The policy directly outputs the control action of the robot gripper. The introduced algorithm uses the goal prediction as an auxiliary task to improve the performance of actor and critic algorithm. The experiments show that the method can be successfully applied to grasping unseen objects. Moreover, it is shown that the policy can be fine-tuned on unknown objects using hindsight goals from successful episodes to achieve continual learning.

6D robotic grasping beyond top-down bin-picking scenarios is a challenging task. Previous solutions based on 6D grasp synthesis with robot motion planning usually operate in an open-loop setting without considering the dynamics and contacts of objects, which makes them sensitive to grasp synthesis errors. In this work, we propose a novel method for learning closed-loop control policies for 6D robotic grasping using point clouds from an egocentric camera. We combine imitation learning and reinforcement learning in order to grasp unseen objects and handle the continuous 6D action space, where expert demonstrations are obtained from a joint motion and grasp planner. We introduce a goal-auxiliary actor-critic algorithm, which uses grasping goal prediction as an auxiliary task to facilitate policy learning. The supervision on grasping goals can be obtained from the expert planner for known objects or from hindsight goals for unknown objects. Overall, our learned closed-loop policy achieves over 90% success rates on grasping various ShapeNet objects and YCB objects in the simulation. Our video can be found at this https URL .

Link: https://arxiv.org/abs/2010.00824