Data Stewardship (available)

Starting Date: 1 June 2019
Duration: 5 weeks full-time, 10 weeks part-time
Time commitment: Full time or part time
Prerequisites: Understanding of databases and formats such as JSON; ability to interview and liaise non-experts; ability to write reports.

Starting Date: June 2019

Duration: 5 weeks (10 weeks part-time)

Time commitment: Full time/Part time

Prerequisites: understanding of databases and formats such as JSON;
ability to interview and liaise non-experts; ability to write reports.

Approximately 80% of the time that a Data Scientist spends on a day to
day basis is on finding relevant data set and getting that data into an
appropriate format (Crowd Flower Data Science survey 2016). Data
stewardship is a comparatively new role where an individual oversees
these tasks, ensuring that data sets are available for individuals
within an organisation (or beyond) and that the data sets are properly
annotated and cleaned for these tasks. These types of roles are likely
to become more and more important within Industry and Research. Within
this debate the FAIR principles (Findable, Accessible, Interoperable,
Reusable) [1] are coming to the fore as a guide to how to provide the
relevant layers for data.

In collaboration with the Strategic Planning office of Royal Holloway
the student will carry out a series of interviews with individuals to

– The key data sets that are generated within the University and what
needs to be integrated from outside the organisation,

– What analysis steps are typically carried out on these data sets,

– What analysis cannot be easily done,

– How that data is stored and the requirements in terms of how long it
needs to be stored.

On the basis of these interviews the student will do some initial
prototyping of the optimal database for storing these datasets and the
annotation that is necessary for them (using Dublin Core Metadata
standards [2] and [3]). The student will also write a
report outlining the overall challenges in improving the current storage
of data, what the optimal platform would be and roughly how much time
and effort would be required to carry this out.

[1] M. D. Wilkinson et al., ‘The FAIR Guiding Principles for
scientific data management and stewardship’, Sci. Data, vol. 3, p.
160018, 2016.

[2] M. Dekkers and S. Weibel, ‘State of the Dublin Core Metadata
Initiative, April 2003’, D-Lib Mag., vol. 9, no. 4, Apr. 2003.