Learning Multi-Stage Tasks with One Demonstration via Self-Replay

Norman Di Palo and Edward Johns

Published at CoRL 2021

Abstract

Video

In this work, we introduce a novel method to learn everyday-like multi-stage tasks from a single human demonstration, without requiring any prior object knowledge. Inspired by the recent Coarse-to-Fine Imitation Learning, we model imitation learning as a learned object reaching phase followed by an open-loop replay of the operator's actions. We build upon this for multi-stage tasks where, following the human demonstration, the robot can autonomously collect image data for the entire multi-stage task, by reaching the next object in the sequence and then replaying the demonstration, repeating in a loop for all stages of the task. We evaluate with real-world experiments on a set of everyday multi-stage tasks, which we show that our method can solve from a single demonstration.

4-Minute Summary

Motivation

Imitation Learning should minimise the amount of human effort needed to teach a task. Modern methods often need substantial effort from the human operator in a form or another. Behavioural cloning requires tens or hundreds of demonstrations. Reinforcement Learning, while able to learn to solve tasks guided by a reward function, needs thousands of episodes and environment resets from the operator. Meta Learning methods, able to adapt in one-shot at test time, still require substantial meta-training beforehand. Engineering ad-hoc solutions is also time expensive, and requires prior knowledge of the objects at hand.

We introduce Self-Replay, a method that can solve everyday-like multi-stage tasks with one demonstration from scratch. Without the need of any prior knowledge, our method allows a human operator to teach a task spending only a minute manually controlling the robot.

Coarse-to-Fine Imitation Learning

Our method extends the Coarse-to-Fine Imitation Learning framework. Coarse-to-Fine Imitation Learning shifts the Imitation Learning paradigm from learning the full policy to learning to align the end-effector to the bottleneck pose. The bottleneck pose is the relative pose between the robot and the object at the beginning of the demonstration. If the robot can accurately align itself to that pose in novel configurations of the environment, then repeating the demonstration's actions is enough to solve the task. We suggest to read the 5-minute summary of this method here.

Tackling Multi-Stage Tasks

We extend that method to tackle multi-stage tasks, therefore enabling it to solve a wide variety of everyday tasks. We decompose each task in a series of stages, and each stage is composed of a reaching, or coarse phase, and an interaction, or fine phase.

The human operator provides a single demonstration: for each stage, they move the end-effector to the bottleneck pose (arbitrarily selected by the operator), and then move the end-effector to manipulate the object. This two-step procedure is then repeated for each stage until the task is solved.

The goal of the robot therefore becomes to learn to reach the bottleneck pose of each stage.

(Example in the GIFs below, of the human manually controlling the robot. Top row, left: reaching the first bottleneck pose; right: interacting with the object. Bottom row, same pattern for the second stage.)

Self-Replay

Once the robot has received the demonstration, the human operator needs to reset the environment just once. After this step, all the data collection procedure is autonomous and self-supervised. The algorithm stores the bottleneck pose for each stage of the demonstration, and then the actions executed during the fine, interaction phase.

Since the goal of the robot is to reach the same relative pose with the object at test time, the bottleneck pose, during training it moves around the environment, observing the object from different perspectives, learning what is the optimal action for each relative pose with the object. Using a wrist-mounted camera only the relative pose between end-effector and objects influence the observation image. Therefore, moving the end-effector has the same effect as moving the object to different positions and orientations, but the former can be done autonomously.

When enough data has been gathered for a stage, the robot reaches the bottleneck, and then interacts with the object by repeating the demonstration actions, before then moving on to collecting data for the next stage.

(Example in the GIFs below, of the robot performing self-supervised data collection. Top row, left: the robot moves around the workspace to observe the object from different viewpoints; right: when enough data is collected, it replicates the interaction demonstration and then moves to the second stage. Bottom row: same patterns for the second stage.)

Results

Our method is able to solve a series of everyday-like tasks, as shown in the GIFs below.

We compared Self-Replay with state-of-the-art Imitation Learning and Reinforcement Learning methods. Our method is an order of magnitude more time-efficient than the baselines (measured as the time spent by the operator), but also more sample efficient thanks to the coarse-to-fine decomposition, that allows the robot to focus on learning how to reach the bottleneck pose, instead of trying to learn the full task. The graphs below show these comparisons.

Finally, to show the versatility and robustness of our method, we showed a live demo at CoRL 2021, where we both trained and then tested our method live at the conference. The GIF below shows an example of the learned controller when trained with just one demonstration. The controller generalises across both object poses, and distractors.

x 4 speed