Software Engineering – Undergraduate Research Opportunities

A Cyclic Prover for Propositional Dynamic Logic (available)

Propositional Dynamic Logic (PDL) [1] is a logic in which abstract properties about the behaviour of programs can be expressed, in a very general way. Because of its generality it is suitable for checking the behaviour or a wide range of programs and systems, via model checking. PDL is what is known as a modal … full description “A Cyclic Prover for Propositional Dynamic Logic (available)”

A Debugging User Study: Validating Debugging Research Findings (available)

Project Description: Understanding and fixing software faults is a challenging task for developers [1]. To address this challenge, researchers have designed several debuggers. For instance, automated fault localisation (AFL) techniques (e.g., Ochiai [1]) and automated program repair (APR) tools (e.g., GenProg[2]) are designed to support developers during software debugging tasks. In addition, researchers have gathered … full description “A Debugging User Study: Validating Debugging Research Findings (available)”

Analyzing the Evolution of Fairness properties in ML-based Code repositories (available)

Project Description: Machine Learning (ML) systems are vital components of everyday products and services in our society. ML models have been popularly employed in law enforcement (e.g., COMPASS), legal use cases (e.g., LegalBERT), and programming language tasks (e.g., CodeBERT, Github Copilot, etc.). Despite the criticality of these use cases, ML systems are often biased towards … full description “Analyzing the Evolution of Fairness properties in ML-based Code repositories (available)”

Attacking Large Pre-trained Programming Language Models (PLMs) via Backdoors (completed)

Project Description: Backdoors refer to a class of Machine Learning (ML) attacks where an adversary trains an ML model to intentionally misclassify any input to a specific label [1]. This is typically achieved by poisoning the training data, such that inputs are misclassified to a target label when the backdoor trigger is present. For instance, … full description “Attacking Large Pre-trained Programming Language Models (PLMs) via Backdoors (completed)”

Automated Debugging of Invalid Inputs generated by Fuzzers (available)

Project Description: Fuzzing is a popular testing method used to ensure the reliability, security and correctness of software systems. These tools allow developers to find bugs and vulnerabilities in software systems automatically. For instance, AFL is a popular fuzzer that has exposed thousands of bugs in open-source software provided by Google, Amazon and Firefox [1]. … full description “Automated Debugging of Invalid Inputs generated by Fuzzers (available)”

Cyclist: Verifying Programs using Cyclic Proof (available)

Cyclic proofs are a technique for proving properties inductively, or proving properties about inductively defined objects or processes. They can be used to prove logical statements, or to verify recursive programs. The Cyclist tool [1] has been developed as an automatic verification tool based on cyclic proofs. It can prove termination and verify temporal properties … full description “Cyclist: Verifying Programs using Cyclic Proof (available)”

Empirical evaluation of static verifiers for the Go programming language (available)

Go is a language that natively supports many constructs to synchronise concurrent threads. Because of this, developing code that contains subtle concurrency bugs is rather common. To address this problem, several groups of researchers have put forwards tools to find concurrency bugs in Go codebases. The objective of this project is to evaluate how some … full description “Empirical evaluation of static verifiers for the Go programming language (available)”

Engineering ROTOR: a Refactoring Tool for OCaml (ongoing)

OCaml [1] is a mature functional programming language with an expressive type system. Recently, we have developed a prototype tool, called ROTOR, for automatically refactoring OCaml codebases [2]. Currently, ROTOR handles renaming of functions. This is surprisingly hard due to OCaml’s powerful module system: renaming a function in one module may actually require renaming functions … full description “Engineering ROTOR: a Refactoring Tool for OCaml (ongoing)”

Formalising Recursive Adaptive Grammars using a Proof Assistant (available)

The aim of this project is to formalise using proof assistant (such as Coq [1], Agda [4], or Isabelle [5]) the meta-theory of the recursive adaptive grammar framework [2] Programming languages are usually specified on two levels. The first level defines a context-free grammar that specifies the syntax of the language. Usually, however, not all … full description “Formalising Recursive Adaptive Grammars using a Proof Assistant (available)”

Implementation of biologically inspired efficient deep learning models (available)

As deep learning models continue to grow in size and complexity to tackle increasingly difficult tasks, the need for efficient and scalable models becomes ever more important. Extremely large language models like GPT-4 require massive computational resources and expensive hardware to train and run. This makes them impractical to deploy at scale in many real-world … full description “Implementation of biologically inspired efficient deep learning models (available)”

Improving Automatic Bug Detection in JavaScript (completed)

Dynamic symbolic execution (DSE) is an effective tool for bug detection in real software. Like unit testing and fuzzing DSE executes portions of a program, exposing bugs through runtime program exceptions. In DSE, some inputs to the program under test are made “symbolic” while the rest are fixed. Whenever the symbolic execution encounters a conditional operation … full description “Improving Automatic Bug Detection in JavaScript (completed)”

Jupyter notebooks (available)

Prerequisites: experience with Python (useful) and Javascript programming (essential). It may be useful to be willing to learn about functional programming (but this is not essential). Jupyter notebooks [1] are examples of literate programming [2] where code and outputs from the code as well as documentation are in the same application. Jupyter allows the user … full description “Jupyter notebooks (available)”

Machine Learning Library for OCaml (completed)

Frameworks for machine learning include Python’s TensorFlow [1]. These frameworks provide standard ways of specifying models that can be optimised by machine learning algorithms. OCaml [2] is a mature functional programming language with an expressive type system. DecML [3] is a prototype OCaml extension that facilitates implementing machine learning tasks, based on specifying models as … full description “Machine Learning Library for OCaml (completed)”

Predicting Debug Symbols for Closed Source Binaries (completed)

Reverse engineering binaries, whether malicious or benign, is made more difficult by the absence of debug information. Variables and functions have had their identifiers “stripped”, so reverse engineers have to manually name them during analysis based on human understanding of the code functionality. The goal of this project is to use machine learning to predict … full description “Predicting Debug Symbols for Closed Source Binaries (completed)”

REST API and web frontend for a JavaScript symbolic testing framework (completed)

ExpoSE.js is a symbolic testing framework being developed at Royal Holloway designed to assist developers in improving the security and reliability of JavaScript applications, a language for which traditional software testing solutions have failed to produce satisfactory results. Symbolic execution is a technique which allows for the systematic enumeration of feasible paths of a program. … full description “REST API and web frontend for a JavaScript symbolic testing framework (completed)”

Siri for Secure Programming – On-fly Secure Programming Assistant (completed)

Project Description Secure coding is a set of best practices for making software (during development) as secure and stable as possible. It encompasses everything from recommendations for cryptographic usage, moving sensitive data, accessing a file system, and managing memory. As the security landscape is always changing, secure coding requires programmers to be up to date … full description “Siri for Secure Programming – On-fly Secure Programming Assistant (completed)”

Smart IDE for Cascading Style Sheets in Web Development (completed)

You will construct an experimental IDE to aid programmers in writing Cascading Style Sheets (CSS) for web applications. Style sheets dictate how a web page appears, and consist of a series of rules which are applied to elements of the web page. Determining exactly which rule should be applied to which element is not entirely … full description “Smart IDE for Cascading Style Sheets in Web Development (completed)”

Visitors for Generalized Algebraic Data Types in OCaml (completed)

Algebraic datatypes (ADTs) [1] are the basic method for defining how to build complex data values in functional programming languages: they specify both the basic, or atomic, values, as well as the constructors for building larger values out of smaller ones. For example, the following ADT defines generic lists that contain values of type ‘a … full description “Visitors for Generalized Algebraic Data Types in OCaml (completed)”

Visualising Interactions Between Developers During Code Review (completed)

Code review (sometimes referred to as pull request) is a code change auditing technique done by developers other than the author of the change. Recent studies demonstrated diverse benefits of code review. For example, Bacchelli and Bird reported that code review is effective to share knowledge between developers and to improve code changes [1]. In … full description “Visualising Interactions Between Developers During Code Review (completed)”

WebAssembly-based microarchitectural covert channel attacks: capabilities, proof-of-concept, and implications (completed)

WebAssembly-based microarchitectural covert channel attacks: capabilities, proof-of-concept, and implications Microarchitectural covert channels are a threat to data confidentiality in multi-tenant environments (cloud platform, mobile phone, etc.). This type of leakage channel aims at tunnelling information across isolation boundaries (sandboxing, censorship, etc.) by exploiting timing variations during program execution. Indeed, the state of microarchitectural components … full description “WebAssembly-based microarchitectural covert channel attacks: capabilities, proof-of-concept, and implications (completed)”

Working on computer mediated artwork with the Tate Gallery (completed)

The Tate Gallery owns a set of major artworks which are computer mediated; that is they employ computers to manage interaction with gallery visitors in a variety of often technically challenging ways. We are working with the conservators at the Tate to document these systems and to think about managing their future in the long … full description “Working on computer mediated artwork with the Tate Gallery (completed)”