The last few years have seen something of a crisis for the academic community as the reproducibility of research has been severely called into question.
Whilst much of the debate around the topic has involved the validity of the science, there are obviously certain issues around deliberate manipulation.
The team behind Eterna believe the crowd can provide a good means of validating research findings, whilst a recent Stanford based project takes an automated approach to the task.
Sniffing out fake findings
Just as we give off various ‘tells’ when we’re lying, the authors believe that papers tend to have common characteristics when they contain falsified data.
The team scoured the PubMed archive over a 40 year period for papers that had subsequently been retracted. Over that timeframe, some 253 papers fitted the bill, with those papers then compared with unretracted papers from the same years, journals and subject areas.
The fraudulent papers were then rated along an ‘obfuscation index’ to gage the degree by which the researchers cooked the books.
“We believe the underlying idea behind obfuscation is to muddle the truth,” the authors say.
“Scientists faking data know that they are committing a misconduct and do not want to get caught. Therefore, one strategy to evade this may be to obscure parts of the paper. We suggest that language can be one of many variables to differentiate between fraudulent and genuine science.”
Covering ones tracks
The analysis revealed that fraudulent papers tended to score significantly higher on the obfuscation index than papers who were retracted for reasons other than fraud.
“Fradulent papers had about 60 more jargon-like words per paper compared to unretracted papers,” the authors say. “This is a non-trivial amount.”
By using a particular form of language to cover up their lies, the researchers give the game away to a system that knows what to look for.
For instance, fraudulent papers were found to use fewer positive emotions to try and stifle any praise for the data, just in case excessive praise may have drawn excessive attention to the work.
The ultimate aim is to automate the process so that suspicious papers can be flagged automatically and an editor can give it the scrutiny it deserves.
“Science fraud is of increasing concern in academia, and automatic tools for identifying fraud might be useful,” the authors say. “But much more research is needed before considering this kind of approach.
“Obviously, there is a very high error rate that would need to be improved. But also science is based on trust, and introducing a ‘fraud detection’ tool into the publication process might undermine that trust.”