Computation on the PID graph with graphQL queries (available)

Starting Date: 1 June 2020
Duration: 10 weeks
Time commitment: Full time
Prerequisites: Python. An understanding of databases and API's. Understanding GraphQL is a plus but not essential as the student can use this project as an opportunity to learn this.


Persistent Identifiers (PIDs) are a mechanism to provide persistent identification to entities which cannot be guarenteed by other identifiers such as a URL. The most well known of these are DOI’s ( which typically identify published articles, but a wide variety of other identifiers exist, such as ORCiD’s to identify individuals. PIDs are not only important to uniquely identify a publication, dataset, or person, but the metadata for these persistent identifiers can provide unambiguous linking between persistent identifiers of the same type, e.g. journal articles citing other journal articles, or of different types, e.g. linking a researcher and the datasets they produced.

Work is needed to connect existing persistent identifiers to each other in standardized ways, e.g. to the outputs associated with a particular researcher, repository, institution or funder, for discovery and impact assessment. Some of the more complex but important use cases can’t be addressed by simply collecting and aggregating links between two persistent identifiers, including

  1. Aggregate the citations for all versions of a dataset or software source code

  2. Aggregate the citations for all datasets hosted in a particular repository, funded by a particular funder, or created by a particular researcher

  3. Aggregate all citations for a research object: a publication, the data underlying the findings in the paper, and the software, samples, and reagents used to create those datasets.

The PID graph ( is a step to carry this out. It is accessible via GraphQL ( interface with an API to, for example, Python.

The Project

A student working on this project would initially develop tools to provide an easier interface to customise visualisations of the graph on a Jupyter notebook. Other projects include porting the API to run with R or specific requests listed in the PID forum here

At a more advanced level  the student can then use their understanding of the API to explore the PID graph using, for example, graph analytic techniques to explore the overall connectivity of the graph or to examine the connectivity of the graph over different disciplines.

More information

Fenner, M., & Aryani, A. (2019). Introducing the PID Graph (Version 1.0).

Fenner, M. (2019). Using Jupyter Notebooks with GraphQL and the PID Graph (Version 1.0).