In this paper, we study the problem of adapting manipulation skills involving grasped objects (e.g. tools) learned for a single grasp pose to different novel grasp poses. Most robot learning methods address this by learning skills over a range of grasps explicitly, but this is highly inefficient. Instead, we propose a method to adapt such skills directly while only requiring a period of self-supervised data collection, during which a camera observes the robot’s end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we show that it consistently outperforms 5 baselines for both RGB and depth modalities. Specifically, compared to the best-performing baseline, our method results in an average of 28.5 % higher success rate when adapting skills to novel grasps on several everyday tasks.
Key Idea In a self-supervised manner, the robot emulates possible grasps by moving its end-effector with an object rigidly grasped while capturing images of that object using an external camera. By leveraging the captured images, a network is trained that is used to adapt a skill learned for a single object grasp to any novel object grasp during skill deployment immediately. This process allows adaptation of skills to novel grasp poses with very high accuracy using any camera modality, while requiring no prior object knowledge, camera calibration or human time.
Manipulating grasped objects, such as tools is at the core of robotics manipulation. However, existing skill learning methods, such as reinforcement or imitation learning algorithms, either assume that the grasped object's pose remains unchanged both during the skill learning and skill deployment phases, or require explicit training of that skill over a wide range of grasps. This is either limiting, as object grasps often change, or very inefficient, as having to learn a skill for every possible grasp can be time consuming. Despite these important limitations, the problem of adapting a skill learned for a single grasp to different novel grasps has received little attention. And while there exist methods to adapt skills learned for a single grasp pose to novel grasps, they rely on: (1) prior object knowledge, such as 3D CAD models, or require object category-specific training data both of which may not be readily available in many practical scenarios; (2) depth images which can be noisy or have missing depth and (3) precise knowledge of a camera’s extrinsic parameters, which can negatively affect their performance due to challenges with camera calibration. As a result, existing methods are either not applicable to many practical scenarios as prior object knowledge is not available or cannot adapt skills successfully as their reliance on depth data and camera calibration hinders their performance.
Contributions. Motivated by all the above limitations, this work contributes a novel method that can adapt skills learned for a single grasp pose to novel grasps with considerably higher success rates compared to previous methods while (1) assuming no prior object knowledge, such a 3D CAD model, (2) can operate with only RGB images, (3) is robust to missing depth data if using depth images, and (4) does not require any camera calibration.
Results Overview. In 1360 real-world experimental evaluations, we demonstrate that our method can can adapt skills with 28.5% higher success rate when compared to 5 strong baselines.
How does self-supervised data collection help?
How is the self-supervised data used?
How is the method deployed in practice?
Examples of adapting skills taught with the imitation learning method DOME to different novel deployment grasps
Examples of adapting a precise peg-in-hole skill to a novel deployment grasp
Adapting different skills manipulating the same object to different grasps zero-shot using the same alignment network
Once our alignment network is trained for a grasped object, it can be used to adapt skills across any task for which that object is used, in a zero-shot manner with no further training. The following video shows how the same alignment network can be used across 3 different tasks to adapt skills to novel grasps immediately after the skills are taught to the robot. The following video is uncut and demonstrates both the skill teaching and skill adaption process for all tasks sequentially.