Data sharing is key to overcoming the reproducibility crisis

The opening up of the data that underpins scientific research is something I’ve long supported, and whilst a degree of progress has been made in the area, there is still much to be done.

For instance, I wrote earlier this year about a study from Elsevier that discussed an apparent paradox in the research world. It revealed that whilst the majority of researchers openly admit that their work would benefit from more open data, few of them actually share their own data.

The study consisted of a survey of over 1,200 researchers from around the world in fields including genetics and humanities. It came to a number of clear conclusions:

Researchers support open data – at least when it comes to the benefits their own research derives from it. They’re much less familiar with sharing their own data, with inexperience and the academic culture significant reasons given for this.
Funders are not driving change – researchers didn’t feel that the wishes of funders to share data more widely were driving change, with most researchers believing they own the data used in their work.
Many are still not sharing at all – 34% of researchers don’t publish data at all, and when they do, it’s usually in the form of tables and annexes rather than raw data.
Patchy standards – researchers are evenly divided between those who think that good standards exist for citing published data and those who do not.
Subject specific sharing – there were also pronounced differences in the sharing practices across subject areas, with some subjects firmly embedding data sharing into the design and execution of research.

Supporting reproducibility

Something the paper didn’t touch on is the challenge the research industry has with reproducibility. A recent paper from researchers at Penn State suggests that this may have as much to do with the difficulties in managing data as any other factor.

“What we researchers try to do is provide the science-consuming public with genuine insights about brain and behavior,” the authors say. “We want to say things that are robust and true. Without reproducibility it’s hard to say that convincingly.”

There are increasingly technology systems available to help researchers not only ensure their own data is usable, but also to easily work with the data generated (and shared) by others. There are also data repositories, such as the Databrary platform developed by lead author Rick Gilmore, to assist researchers.

Nowhere are the challenges greater than in cognitive neuroscience. It’s an extremely computationally intensive field, with data produced in a wide range of sizes and formats from devices such as EEGs and MRI scanners. This leads to a fragmented landscape for data sharing in the sector.

“Right now, data sharing is still largely unfunded and unrewarded and is only rarely required,” the authors say. “It’s something that isn’t a universal requirement for federal grant funding, for example.”

They suggest that rather than treating the published paper as the finished product, we need researchers to regard the data that underpinned those papers with equal importance.

“In addition to publishing scientific papers, behavioral and brain scientists need to be more open about the detailed procedures underlying their studies, more freely share the statistical programs that they use in analyzing data,” they say. “And researchers should share the data itself as openly as possible.”

It’s something that I think most people outside of the research industry understand, but changing behaviors inside it remains challenging.