When we register with any online or digital services, we sign an end user agreement. Such agreements detail how a company will collect, store and use the data related to the end users. These agreements are long and users don’t have time to read them thoroughly. However, these agreements are a rich source of material on which the company’s data retention and usage policies (data governance policy) are based on.
A data governance policy is a set of guidelines for ensuring the proper management of an organization’s digital information. Such guidelines can involve policies for data security, quality and privacy.
This project aims to build an automated toolset that analyse the end user agreement text and convert them into (enforceable) data governance policies. These policies can then be used to audit the data management of the relevant company to see whether they comply with their own set of policies as stipulated in the end user agreement. For data auditing purposes, the organisations data provenance records would be used as evidence to see whether the organisation is abiding with their own data governance policies. The toolset could also be used to verify whether an end user agreement of an organisation conforms to relevant data protection laws.
The student should have an interest in and willingness to learn basic human-language processing, ideally would have prior knowledge of data governance policies. Ideally, have a firm grasp of C or C++, Java and C# programming languages. Good time-management, self-starter and strong writing skills.
It is intended that once the implementation is working it can be used for practical trials, and we would anticipate a conference paper being submitted for publication based on the implementation and subsequent trials; the author of the code would be a co-author of this paper.