FAIRness of Repositories & Their Data: Research Data Management Working Group Report

Posted: 24-06-2019 Topics: Strategy

The results of two data repositories surveys, carried out by LIBER’s Research Data Management Working Group, are now available.

The report, which can be downloaded from Zenodo, summarises the answers given by managers, librarians and technical staff with regards to:

The FAIRness of repositories and their data;
Misconceptions related to the principles’ definition and implementation;
The complexity of the implementation and the importance of the FAIR principles for the repository community.

Best Practices

Best Practices for the implementation of the FAIR principles are summarised in the report, such as:

DOI, Handle, URN, URI or locally generated numbers should be used as permanent identifiers for metadata records and data.
Well-known standardized global vocabularies such as ISO vocabularies for country and language codes, as well as COAR, OpenAIRE, and DataCitevocabularies for publication/resource types, access status and roles should be applied as much as possible.
Repositories should preserve information about data provenance stored in metadata: creator, institutions – publishers, source, mail address, publication year, production year, geo-location, data collector, data manager, distributor, editor, funder, producer, rights holder, sponsor, and supervisor.

At the same time, the surveys highlighted some misunderstanding of the FAIR Principles, and misleading implementations.

Misunderstandings

Rich Metadata Models -The definition of what constitutes a rich metadata model is not well defined, and this leads to some misunderstanding of the F2 FAIR principle. In this survey, a majority of respondents said they used a rich data model but 12 of the analyzed repositories had 13 or fewer mandatory fields. Of these, eight had seven or fewer mandatory fields.
Machine Readability – Nearly 80% of respondents said their repositories completely comply with the I1 FAIR principle: (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation). This means humans and computers should be able to exchange and interpret each other’s data4. Data should be readable for machines without the need for specialised or ad hoc algorithms, translators, or mappings. In order to ensure this, it is critical to use (1) commonly used controlled vocabularies, ontologies, thesauri and (2) a well-defined framework to describe and structure (meta)data. However, 45% didn’t answer or said they didn’t know whether their repository could display metadata in some semantic web technology such as OWL, RDF notation. Five repositories offer metadata in a semantic web technology, while four plan to implement this feature. The remaining seven respondents reported that there was no such possibility in their repository.
Metadata Provenance – Although provenance of (meta)data (R1.2) should be described in a machine-readable format, some implementations of this FAIR principle include a free-text provenance description or an attached file which describes provenance.
Missing Infrastructure – Taking into account that the I2 FAIR principle is often missed and quite complicated for implementation, an infrastructure/platform/service which could help in this implementation should be a top priority of EU and other funding programs.

Thank you to everyone who took part in the surveys. On behalf of LIBER’s Research Data Management Working Group, we appreciate your time and effort.