Adapting Skills to Novel Grasps: A Self-Supervised Approach
Georgios Papagiannis, Kamil Dreczkowski, Vitalis Vosylius and Edward Johns
Video (Ed will create this link)
In this paper, we address the problem of adapting manipulation skills involving grasped objects (e.g. tools) learned for a single grasp pose to different novel grasp poses. Most robot learning methods address this by learning skills over a range of grasps explicitly, but this is highly inefficient. Instead, we propose a method to adapt such skills directly while only requiring a period of self-supervised data collection during which a camera observes the robot’s end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB, depth images or both, and requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we show that this outperforms 5 baselines for both RGB and depth modalities. Specifically, compared to the baselines our method determines the relative transformation between grasp poses with 71.1% increased accuracy and adapts skills to novel grasps poses with 28.4 % higher success rate on average. Supplementary material, code, and videos of the experiments can be found in this webpage.
Key Idea In a self-supervised manner, the robot emulates possible grasps by moving its end-effector with an object rigidly grasped while capturing images of that object using an external camera. By leveraging the captured images, a network is trained that is used to adapt a skill learned for a single object grasp to any novel object grasp during skill deployment immediately. This process allows adaptation of skills to novel grasp poses with very high accuracy using any camera modality, while requiring no prior object knowledge, camera calibration or human time.
Manipulating grasped objects, such as tools is at the core of robotics manipulation. However, existing skill learning methods, such as reinforcement or imitation learning algorithms, either assume that the grasped object's pose remains unchanged both during the skill learning and skill deployment phases, or require explicit training of that skill over a wide range of grasps. This is either limiting, as object grasps often change, or very inefficient, as having to learn a skill for every possible grasp can be time consuming. Despite these important limitations, the problem of adapting a skill learned for a single grasp to different novel grasps has received little attention. And while there exist methods to adapt skills learned for a single grasp pose to novel grasps, they rely on: (1) prior object knowledge, such as 3D CAD models, or require object category-specific training data both of which may not be readily available in many practical scenarios; (2) depth images which can be noisy or have missing depth and (3) precise knowledge of a camera’s extrinsic parameters, which can negatively affect their performance due to challenges with camera calibration. As a result, existing methods are either not applicable to many practical scenarios as prior object knowledge is not available or cannot adapt skills with high accuracy as their reliance on depth data and camera calibration hinders their performance.
Contributions. Motivated by all the above limitations, this work contributes a novel method that can adapt skills learned for a single grasp pose to novel grasps with high accuracy while (1) assuming no prior object knowledge, such a 3D CAD model, (2) can operate with only RGB images, (3) is robust to missing depth data if using depth images, and (4) does not require any camera calibration.
Results Overview. In 1360 real-world experimental evaluations, we demonstrate that our method can determine how the object grasp has changed between the skill learning and deployment phases with 71.1% higher accuracy and can adapt skills with 28.4% higher success rate when compared to 5 strong baselines.
How does self-supervised data collection help?
How is the self-supervised data used?
How is the method deployed in practice?
Examples of adapting skills taught with the imitation learning method DOME to different novel deployment grasps
Examples of adapting a precise peg-in-hole skill to a novel deployment grasp
Adapting different skills manipulating the same object to different grasps zero-shot using the same alignment network