So it’s pleasing to see the recent launch of BioStudies from the European Molecular Biology Laboratory. It aims to provide a repository for structured and unstructured data.
The data itself will sit separately from the study itself, with the hope that this makes it easier to update the data as needed.
“When all the data supporting a paper is grouped in one place, it simplifies things for journals and authors – but also other researchers who want to re-examine the data in new contexts,” says Jo McEntyre, head of Literature Services at EMBL-EBI. “It’s easier to cite the data, and easier for authors to add important information after publication. Authors also have a permanent location for their supplemental data, where it’s linked to the relevant articles and data in standard repositories, and available for re-use and discovery.”
Traditionally, data is produced in a range of formats, whether it’s images, numerical data, genomic data and so on, with a single study making use of a number of these, all of which tend to be archived in different resources. This can make it difficult for the data to be analyzed.
“Say you’re submitting a paper about your new study, and it’s based on sequence data in the ENA, a metabolomics dataset in MetaboLights, a spreadsheet with data that doesn’t fit anywhere and a dozen images,” says McEntyre. “Keeping it in one place makes it coherent: all the links and the unstructured data are archived and presented as one BioStudies record. This makes it much easier to take advantage of the emerging, more robust methods for citing the data in an article.”
The aim is for each dataset submitted to the repository to come complete with detailed metadata so that it can easily be located, regardless of the format it’s in.
The team behind the project hope that the repository will provide them with advanced warning of emerging technologies and formats, especially those for which standards have not yet emerged.
Once these have been identified, they can work with the community to help devise some clear structure for such datasets.
With open science a fundamental plank of the EU’s innovation policy, projects like this should go some way towards delivering on that promise.