MSc Individual Projects

Above: A selection of images from projects in previous years

If you are interested in applications of machine learning and computer vision to robotics, then you may be interested in one of the five MSc individual projects described below. All these projects are genuine research projects, based on a new idea which I have recently been thinking about. So if you are productive during the project, then there is every chance that you would be able to get a publication at a top international conference, which would look fantastic on your CV. And for top students, there may also be an opportunity to apply for a PhD position in my lab.


Projects will be done remotely, with regular meetings with me on Microsoft Teams. However, if College opens up over the summer, then for some projects there may be an opportunity to test out your final method on the robots in our lab.


For students interested in one or more of these projects, please send me an email at In your email, please attach your CV, and include a brief statement (~5 sentences) on what your potential career plans are after graduation, and how the project fits in with those plans. I will then arrange a meeting where we can discuss the project(s) in more detail. Further departmental guidelines can be found by clicking here.

I look forward to hearing from you!

Tabular reinforcement learning and deep reinforcement learning are two different ways for agents to learn through interaction and exploration. Both have their pros and cons. Tabular methods are stable since they store independent Q-values for each state-action pair, but they do not scale well to high-dimensional state or action spaces. Function approximation methods (i.e. deep reinforcement learning) scale naturally to high-dimensional and continuous spaces, but are notoriously unstable. You will have found this in the Reinforcement Learning coursework, when your trained agent suddenly "forgot" how to solve the task, and its policy abruptly changed dramatically.

In this project, we will re-visit tabular reinforcement learning, and try to do a better job of making this method work. The idea is to address the scalability issue of tabular methods, with a curriculum learning approach. Here, the state and action spaces would initially be discretised very coarsely, such that all of the Q-values can easily be fit into memory. Learning would then proceed until the policy's performance saturates. Then, the state and action spaces would be discretised further, to create a better approximation of the real world. However, rather than storing Q-values for every state-action pair, we would only create a table entry for those state-action pairs which are actually likely to be visited by the agent, based on the current policy. This would be a significantly smaller subset of entries in the full Q-value table, making it now feasible to fit this in memory. This process would then continue over a number of iterations, with each iteration creating a finer-grained discretisation. At each iteration, the increased granularity of the discretisation would be countered with a reduction in the state-action space which is actually stored in memory, ensuring that an upper-bound on memory can be achieved. The effect of this is a curriculum learning approach: the agent first solves a simple task with coarse discretisation, which is then used to initialise learning of more and more complex tasks, as the discretisation approximates the real world more and more accurately.


This is a new idea which has the potential to address many of the shortcomings of deep reinforcement learning, and which I am personally very excited to explore. Experiments will begin where your Reinforcement Learning DQN coursework finished, by comparing your DQN to a tabular equivalent. The first milestone is to fully implement the curriculum learning idea for the maze environment, and get some initial results for a proof of concept. Then, a fuller set of experiments will be done in more sophisticated robotics simulators, such as OpenAI Gym [3] and DeepMind Control Suite [4]. There are then lots of avenues for this project to further develop depending on your own interests, with both theoretical and experimental contributions possible.



[1] Reinforcement Learning: An Introduction

[2] Human-level Control through Deep Reinforcement Learning

[3] OpenAI Gym

[4] DeepMind Control Suite

OpenAI Gym

Curriculum Reinforcement Learning: Re-visiting Tabular Methods

Robot Learning from Human Demonstrations via Imagination

When training a robot to perform a new task, one of the first steps is to describe the task to the robot. For example, if the robot uses reinforcement learning, then this description is in the form of a reward function. However, in practice, we want the average person to be able to teach a robot a new task, without that person needing to understand what a reward function is, or how to write Python code to tell the robot about this reward function. For example, a commercial robot that can clean and tidy the home, will need to be able to learn these particular cleaning tasks directly from its owner. One solution to this is imitation learning [1], where the person provides a demonstration of the task [2], such as by physically moving the robot's arm to interact with an object. The robot's job is to then understand how to perform that same task when the environment subsequently changes, such as when the object is in a new position.

This project will explore an imitation learning method where a robot learns to interact with an object, using a camera mounted to the robot's wrist as the robot's "eye" (e.g. [3], and the image on the right). The main idea is that images of the object would be captured during the demonstration, and then used to estimate the pose (position + orientation) of the object relative to the camera, such as with a convolutional neural network [4]. This pose can then be used to train a robot controller using labels from the human demonstrations, such as training a neural network with supervised learning to map from the object pose to the robot's action [2].


However, what happens if the object is in a pose which was not observed during the demonstrations? This method would fail.


Therefore, the main idea in this project is to use "imagination", where the robot can imagine what the object looks like from new viewpoints that have never previously been observed. This imagination module would be a neural network which would take as input two images, and output the relative pose between them. You will train this network over a large dataset of other random objects, and we would study how well this can then generalise to estimating poses for a novel object, even if that object was not in the original dataset. Hopefully, this would then allow a robot to learn a new task from just a single demonstration, without requiring you to manually move the object around between demonstrations. I would be very excited to see a system like this working in practice! Nobody in the world has achieved single-demonstration imitation learning for complex tasks yet.


Experiments will initially be conducted in simulation using CoppeliaSim [5], which has an easy-to-use Python API [6]. If College opens up over the summer, you will then be able to perform some real-world experiments with our robots, where you will physically provide the human demonstrations yourself.


[1] An Algorithmic Perspective on Imitation Learning

[2] Deep Imitation Learning

[3] Deep Learning a Grasp Function

[4] PoseNet

[5] CoppeliaSim

[6] PyRep

Robot Learning from Human Demonstrations

Self-Supervised Robot Learning from "Playing"

Imagine you wanted to ask a robot to perform a task. One way of describing this task to the robot, is to provide an image of the completed task. For example, you could show a robot an image of your desk in its "tidy" state, and then ask the robot to re-create this environment state every time the desk becomes messy. For a machine learning approach to this, a neural network could be trained, which takes as input the target image and the current image, and outputs an action. Then, a sequence of such actions would be executed by the robot in a loop, until the target state is achieved.

But how could we train such a network? This project will investigate an idea based on exploration, similar to reinforcement learning.

The overall idea is that a robot could randomly explore its environment, "playing" with objects in an arbitrary and unstructured manner [1], in a similar way to how babies learn motor skills. This would then generate data tuples: the initial image, the final image, and the action. And this data could then be used to train the above network with supervised learning, simply from random data collection. Since the data is collected and labelled automatically, this is a form of self-supervised learning. The first milestone is to achieve a basic implementation of this idea, with a robot that randomly pushes around objects on a table [1].


One of the challenges you will soon face is that of scalability. The amount of data collected could be huge, and it would be difficult for a single neural network to do a good job of remembering all of this data. Therefore, one idea to make this more scalable, is based on the intuition that some actions require less precision than other actions. For example, imagine a task which involves a robot inserting a plug into a socket. When the plug is far away from the socket, the specific trajectory taken by the robot's arm is not very important. But when the plug is just a few mm away from the socket, then the specific trajectory is very important. Therefore, the loss function for the supervised learning could be adjusted such that the network "tries harder" at remembering important actions, such as when two objects are very close to each other. The second milestone is therefore to develop a method to automatically tune this loss function as training progresses.


Following this, there are a number of interesting directions in which the project could go. For example, we could investigate a curriculum learning approach, where the tasks become more and more difficult over time. This would also emulate how babies learn, where more and more complex tasks are attempted as the baby grows older and gains experience. A recent Master's project in my lab made some good progress with curriculum learning for robotics [2]. So I would personally be very interested to see just how far we can push this, and see if it can enable a robot to learn genuinely complex tasks purely with random exploration. Another extension would be to speed up learning by providing some human demonstration, to guide the agent such that it explores interesting behaviours based on human intuition [3]. Or, we could investigate more open-ended methods for exploration, such as those which encourage natural curiosity [4]. So, lots of exciting options!


Experiments will be conducted in simulation using CoppeliaSim [5], which has an easy-to-use Python API [6].



[1] Learning to Poke by Poking

[2] Curriculum Reinforcement Learning

[3] Learning Latent Plans from Play

[4] Intrinsically Motivated Reinforcement Learning

[5] CoppeliaSim

[6] PyRep

Robot Learning from Random Exploration

Residual Reinforcement Learning with Discrete and Continuous Actions

Deep reinforcement learning [1] is a very exciting area of research, but current methods are notoriously unstable. You will have found this in the Reinforcement Learning coursework, when your trained agent suddenly "forgot" how to solve the task, and its policy changed dramatically all of a sudden. One of the reasons for this is that updating a neural network based on one transition, will change the network's prediction for another transition. Whilst this generalisation can be useful for interpolating between data points, sometimes it causes a smoothing effect between data points when this is actually unintentional, and the state of the network before the update was actually better. This slows slows down the agent's ability to learn, and makes performance highly sensitive to hyper-parameters.

In this project, you will study a new idea for addressing this, which I think has great potential. The idea is based on the fact that neural networks are much more stable when trained for classification, than when trained for regression. This is because with classification, the final layers of the network are independent for each discrete class, which "shields" some parts of the network from other parts: if two data points have different class labels, then updating the weights for one data point does not necessarily change the associated weights for a different data point. However, with regression, the final layers are used to make all predictions: updating the weights for one data point changes the weights for another data point, which may be undesirable.


This motivates the formulation of a "residual" implementation of deep reinforcement learning. Here, the networks used in deep reinforcement learning (e.g. the Q-network) will have two heads: a classification head, and a regression head. The classification head outputs a coarse estimate of the prediction (e.g. a classification of the Q-values with respect to a discrete set of potential values). Then, the regression head outputs a residual value, which is added to the classification value. In this way, the regression head does not need to predict the full range of values across the dataset: it only needs to predict a "correction", which is a much easier function to learn. Hopefully, this will also reduce the instability of pure regression approaches, since the regression range is significantly smaller and so instability effects will be attenuated.


There is good reason to believe that this will work for deep reinforcement learning algorithms, since similar reasoning has been shown to improve the performance in supervised learning. I am very curious to see if we can get this working for a range of different tasks in OpenAI Gym [2] and DeepMind Control Suite [3], and there will likely be some interesting theoretical contributions of this project. For example, you will have to work out how to adjust the Bellman Equation under this residual framework. And as the project progresses, you will then look into making similar adaptations to other reinforcement learning algorithms, such as actor-critic methods [4].



[1] Human-level Control through Deep Reinforcement Learning

[2] OpenAI Gym

[3] DeepMind Control Suite

[4] Soft Actor-Critic

DeepMind Control Suite

Deep Learning for Robotics: Transferring from Simulation to Reality

Deep reinforcement learning [1] and deep imitation learning [2] have proven to be very attractive avenues for robot learning, with a growing body of work showing their applicability to a wide variety of robotics tasks. However, the scalability of those algorithms -- and deep learning in general -- is hindered by their reliance on a massive amount of data. A promising solution to this issue is the use of computer simulation to train the networks, followed by a subsequent deployment of those networks to the real world. Nonetheless, a major challenge in this sim-to-real transfer lies in the “reality gap”, which refers to the differences between the training (simulation) and testing (real world) domains.

This project aims to study the visual aspect of the reality gap for robot control policies, and find the best method to achieve sim-to-real transfer. The project will consist of two parts. The first part, is the implementation and deployment of several methods [3-7] which claim to address this issue. These methods have never been evaluated against each other, and some have only been shown to work on very simple tasks in virtual environments. We aim to determine which of these works best in a real robot control setting. For the second part of the project, once a conclusion is reached for the existing methods, you will be given freedom to innovate by proposing your own sim-to-real method, in an effort to improve even further on the state-of-the-art. For policy training, initial experiments will study basic visual servoing tasks with supervised learning, and then later you will have the opportunity to investigate sim-to-real for both reinforcement learning and imitation learning methods.


Experiments will initially be conducted in simulation using CoppeliaSim [8], which has an easy-to-use Python API [9]. Two simulated worlds will be created, with one of these acting as the training domain, and the other acting as the testing domain (where the simulator emulates the real-world). If College begins to open up over the summer, you will then be able to perform experiments with the robots in our lab, and study how well these policies perform in the real world, even though they have only ever been trained in simulation.



[1] QT-Opt

[2] Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World

[3] Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task

[4] Sim-to-Real via Sim-to-Sim: Data-efficient Robotic Grasping via Randomized-to-Canonical Adaptation Networks

[5] Robust Visual Domain Randomization for Reinforcement Learning

[6] Network Randomisation: A Simple Technique For Generalisation in Deep Reinforcement Learning

[7] DIRL: Domain-Invariant Representation Learning for Sim-to-Real Transfer

[8] CoppeliaSim

[9] PyRep

Simulated policies (left and middle) being deployed in the real world (right)