Privacy issues related to the data stored, regarding end-users are well understood and studies. However, a large set of data can also be collected that is not directly related to the user but related to the user data – known as data provenance.
This project will deploy a database with data provenance framework and then populate it with synthetic data. The aim is to understand whether can data provenance records can violate the privacy policies of an organisation and provide less effective privacy to the end-users. Furthermore, if a third party only has access to the data provenance records, can they violate the privacy requirements of an individual – as stipulated by the data governance policies of the respective organisation.
The student should have an interest in and willingness to learn basic data provenance, would have prior knowledge of basic MySQL. Ideally, would be familiar with user privacy requirements, have a firm grasp of C or C++, Java, or C# programming language. Good time-management and strong writing skills. We would use git and latex to write up the results; prior experience of these tools would be helpful but not required.
It is intended that once the implementation is working it can be used for practical trail using synthetic data, and we would anticipate a conference paper being submitted for publication based on the implementation and subsequent trials; the author of the code would be a co-author of this paper.