Learning 1000 Tasks in a Day

Learning a Thousand Tasks in a Day

Kamil Dreczkowski*, Pietro Vitiello*, Vitalis Vosylius and Edward Johns
* joint first authorship

Science Robotics

Version

Data

Code

BibTex

Published on the cover of

Science Robotics

Issue of November 2025

Abstract

Humans are remarkably efficient at learning tasks from demonstrations, but today’s imitation learning methods for robot manipulation often require hundreds or thousands of demonstrations per task. We investigated two fundamental priors for improving learning efficiency: decomposing manipulation trajectories into sequential alignment and interaction phases, and retrieval-based generalisation. Through 3450 real-world rollouts, we systematically studied this decomposition. We compared different design choices for the alignment and interaction phases and examined generalisation and scaling trends relative to today’s dominant paradigm of behavioural cloning with a single-phase monolithic policy. In the few-demonstrations-per-task regime (<10 demonstrations), decomposition achieved an order of magnitude of improvement in data efficiency over single-phase learning, with retrieval consistently outperforming behavioural cloning for both alignment and interaction. Building on these insights, we developed Multi-Task Trajectory Transfer (MT3), an imitation learning method based on decomposition and retrieval. MT3 learns everyday manipulation tasks from as little as a single demonstration each while also generalising to previously unseen object instances. This efficiency enabled us to teach a robot 1000 distinct every-day tasks in under 24 hours of human demonstrator time. Through 2200 additional real-world rollouts, we reveal MT3’s capabilities and limitations across different task families.

Explainer Video

Key Idea A new highly efficient imitation learning paradigm enables teaching a robot 1000 distinct tasks in just 17 hours.

Learning how to manipulate objects is fundamental to the robots of the future. Learning how to manipulate objects is therefore likely to be fundamental for the robots of the future. However, the prevailing learning algorithms for object manipulation are very data hungry, requiring hundreds if not thousands of real world demonstrations for each new task learnt. As a result, putting together a comprehensive dataset, covering the majority of every day tasks, will require considerable financial and human resources.

In this work, we study the design choices which led to Multi-Task Trajectory Transfer (MT3), a method capable of learning a new task from as little as a single demonstration. Below you can find videos of its deployment in real-world household environments. Each task below is a multi-stage operation learnt from a single demonstration per stage.

Clean Up Desk

Clean Sink

Put Back Comb

Turn Oven On

swipe for more tasks

Scaling Up MT3

Could MT3's efficiency scale to learning a truly diverse range of real-world manipulation tasks?

To answer this, we conducted an unprecedented robotic manipulation study, teaching a robot 1000 distinct manipulation tasks in under 24 hours using just a single demonstration per task. Below is a video of us collecting the one thousand demonstrations on a single robot.

This represents the first work to demonstrate learning manipulation skills at this scale without relying on large demonstration datasets, dramatically surpassing previous studies which typically focused on tens or hundreds of tasks while requiring many more demonstrations per task. The focus on diversity is further underscored when considering that the considered tasks fall under 31 different macro skills and make use of 402 different objects.

To evaluate MT3 we then performed 2200 evaluations, testing its performance on the 1000 seen tasks as well as on 100 unseen ones. The testing environment was further made challenging by the inclusion of distractor objects and the frequent change in lighting conditions and background colours. In the paper we look into these results and we analyse the failure cases of MT3. Below you can find example rollouts from the evaluations

What makes MT3 so efficient?

The prevailing technique in imitation learning for object manipulation is behaviour cloning (BC). The latter uses demonstrations for training, and then at inference, directly predicts actions using a trained neural network. We propose the use of an alternative learning paradigm, which we refer to as retrieval-based algorithms. Such methods do not require robotics data at training time but instead rely on it at test time as a form of guidance. As a result, we devised our own retrieval pipeline to autonomously select the best demonstration to use at inference. As a result, these methods are more directly influenced by relevant demonstrations and we show how this increases their learning efficiency compared to BC.

Secondly, we found that a structural prior, which decomposes manipulation trajectories into two sequential phases of reasoning, leads to highly efficient imitation learning.

Alignment Phase: Before interacting with an object, the robot must move its end-effector to a pose relative to the target object that is sensible for the upcoming manipulation. The specific path taken to reach this pose is not critical, as long as the robot satisfies environmental constraints during its motion. For example, in a plug insertion task, the robot can take many different paths to position the plug in front of the socket.

Interaction Phase: This phase consists in the actual manipulation and requires precise execution, as the specific trajectory is crucial for task success. For example, during the actual insertion of the plug into the socket, the motion must be carefully controlled to ensure proper connection.

We show that by using two specialised policies, one optimised for aligning with objects and the other optimised for interacting with them, we achieve an order of magnitude efficiency gain compared to using a single monolithic policy to handle entire manipulation trajectories.

We investigate two contrasting approaches for designing the alignment and interaction policies: BC and retrieval-based methods. We also compare each combination of them to a monolithic algorithm, which instead learns the entire trajectory.

How effective are these design choices?

In order to better assess the impact of trajectory decomposition and retrieval-based algorithms for the purpose of efficiency we devised some controlled experiments.

To evaluate any combination of algorithm designs, we considered five different methods:

BC-BC: Behaviour Cloning for Alignment - Behaviour Cloning for Interaction.

BC-Ret: Behavior Cloning Alignment, Retrieval-based Interaction.

Ret-BC: Retrieval-based Alignment, Behaviour Cloning Interaction.

Ret-Ret (MT3): Retrieval-based Alignment, Retrieval-based Interaction.
MT-ACT+: Behaviour cloning policy trained on the entire trajectory.

Through 3,450 real-world experimental rollouts across 70 different objects, we systematically analyse the effects of the decomposition prior and retrieval-based generalisation on learning efficiency. By varying both task counts and demonstrations per task, we analyse how each method performs across different data regimes, with a focus on scenarios with limited per-task data.

The results from this experiment are unambiguous: decomposing manipulation trajectories into alignment and interaction phases outperforms learning trajectories with a single monolithic policy, especially when learning from only a few demonstrations per task. Furthermore, using retrieval-based methods to align and interact with objects leads to more efficient learning than when using BC alternatives.

Examples of the behaviour of each method can be found below for 2 of the several tasks considered: Scooping from a pan and inserting bread into a toaster.

MT-ACT+

BC-BC

BC-Ret

Ret-BC

MT3 (Ret-Ret)

MT-ACT+

BC-BC

BC-Ret

Ret-BC

MT3 (Ret-Ret)

The difference in failures

Retrieval-based

When interacting with an object, retrieval based methods perform the same motions as the demonstrations.

Due to the direct influence of the latter, even when failing, retrieval-based methods might incorrectly approach the object, but their motions stay loyal to the demonstrated behaviour.

Behaviour Cloning

On the other hand, behaviour cloning can end up in out-of-distribution states. When in such a situation, the policy will behave in more unexpected ways.

We noticed that something that was particularly hard for our considered BC-based methods was the prediction of termination. This resulted in some executions ending with incorrect and possibly undesirable motions.

Additional MT3 Failure Examples

The main failure modes of MT3 are pose estimation errors and mistakes in the retrieval of the correct trajectory. We share here some examples of what these might look like.

Additional MT-ACT+ Failure Examples

The main failure modes of MT-ACT are imprecise alignments and challenges in replicating complex trajectories. Below are examples of these factors at play.

However, compared to the retrieval-based interaction policy, tackling this phase with behavioural cloning allows for failure recovery.