Opening up data from clinical trials has been a key aspiration for science policy makers over the last few years. The aim is clear, with officials believing that opening up both trial results and the data behind them will enable the whole process to be significantly more efficient, especially if companies are encouraged to publish failed studies as well as successful ones.
An important first step down that road was recently taken with the launch of OpenTrialsFDA. The project, which is one of the finalists in the Open Science Prize, aims to allow researchers to easily search data from the FDA’s drug approval packages. These contain information on both the methods and the results of clinical trials, even if they are never published into the public domain.
This data is often not very accessible today, even if it is technically available. For instance, much of the data is not in a machine-readable format, such as physical documents that have been scanned as an image file. A lack of clear indexing also makes navigating the data difficult.
Opening up trial data
The team hope that the platform will not only allow researchers to access and search raw trial data from both published and unpublished trials, but examine any discrepancies between DAP data and that published in journal articles.
The site uses a web interface together with an API that allows third parties to access, search and represent the FDA information. Data is pulled in from [email protected], and utilizes optical character recognition technology to automate the extraction of text from the database. Algorithms are then used to hunt for clinical trial identifiers.
The open source project is one of six finalists in the Open Science Prize, which aims to promote open data technology to advance biomedical research. The finalists are awarded $80,000 each to help develop their tools.
The other finalists were:
- Fruit Fly Brain Observatory – a project to improve modeling of the diseases affecting fruit fly brains
- Open Neuroimaging Laboratory – a project to advance brain research by supporting annotation, discovery and analysis of brain imaging data
- MyGene2: Accelerating Gene Discovery with Radically Open Data Sharing – a project to support public sharing of health and genetic data
- OpenAQ: A Global Community Building the First Open, Real-Time Air Quality Data Hub for the World – a project to provide real-time information on air quality from around the world
- Real-Time Evolutionary Tracking for Pathogen Surveillance and Epidemiological Investigation – a project to support analysis of epidemics as they emerge