Big Data and Genomics

I’ve written a lot recently about the rise in genomic data, and the applications being developed on top of this. For instance, a recent project featuring IBM and the New York Genome Center (NYGC), The Rockefeller University and other NYGC member institutions.

The work compared a number of techniques that are commonly used to analyze genomic data from tumor cells and healthy cells. It utilized Watson for Genomics technology to help interpret the genome data. The project revealed that Watson was able to provide actionable insights in just 10 minute, which compares to approximately 160 hours of human analysis.

Of course, the more data you have the better, and I’ve covered efforts by genomics startup Genos to facilitate that before. They promise to sequence your entire exome for $499, with a fascinating caveat. The company have announced that they will contribute to the cost of your sequencing on the proviso that you donate your data to scientific research. The hope is that opening up data to research in this way will begin to close the apparent gap that is opening between the amount of genetic data being sequenced and our understanding of it.

Further evidence of the growing importance of genomic data comes via a partnership between the UK Biobank and the European Genome–phenome Archive (EGA), which is itself a joint resource developed by EMBL-EBI and the Centre for Genomic Regulation (CRG).

The partnership will see the data from all 500,000 participants in UK Biobank distributed via the EGA resource. Biobank participants provided blood, urine and saliva samples for future analysis – including genetic – and gave detailed information about themselves. They also agreed to allow UK Biobank to integrate information from their electronic health records.

This provides a vast quantity of data for healthcare research, with any work that results from the use of this data then made available in the public domain for other researchers to build upon.

Sense making

Of course, collecting the data is only one part of the equation, with the analysis equally important. Last year saw a new search engine released by the University of California San Diego that aims to make it easier for us to search our genomics data records.

The search engine, called GeNemo, has been documented in a recently published paper, and aims to make it easier to search for functional genomic data.

Functional genomics data is valuable as it helps to record the range of activities of each piece of the genome. The new search engine hopes to help researchers uncover the various functional aspects of certain parts of the genome that we believe are responsible for disease.

The search engine allows users to query a range of databases, including the entire ENCODE dataset. The search algorithm utilizes pattern matching to offer richer results than traditional text-based searches.

Swiss startup Sophia Genetics are arguably the market leaders in this space. They claim to have the largest clinical genomics community in the world, with an AI-powered platform to help make sense of the genetic data collected.

The company, which recently raised $30 million in a funding round led by Balderton Capital, have deployed their platform in 334 hospitals across 53 countries. To date they’ve managed to analyze over 125,000 patients from around the world.

Privacy concerns

One of the appealing aspects of the Sophia approach is that they only process the anonymized data collected by the hospitals themselves. With something as valuable as our genomic data however, privacy remains a key concern.

A recent paper published in PLOS Biology by a pair of health law researchers from the University of Alberta argues that the whole industry lacks basic legal and ethical principles at the moment around consent, with this only likely to intensify as more genomic data is generated.

With projects such as the UK Biobank, researchers can embark upon projects with hundreds of thousands of participants. Issues around the ownership of those samples, and the consent given by participants around their use persist however. The authors contend that we need real policy movement in the area to cover these concerns, especially as industry is getting increasingly involved.

“The international research community has built a massive and diverse research infrastructure on a foundation that has the potential to collapse, in bits or altogether. This issue would benefit from more explicit recognition of the vast disconnect between the current practices and the realities of the law, research ethics and public perceptions,” they say.

It’s a topic that was touched upon heavily in a recent paper from Professor Dame Sally Davies into the current state of genomic service provision in NHS England.

The report examines the potential for genomics to significantly improve the health of the nation. It provides clear evidence of its potential in areas such as screening, disease diagnoses and personalized prevention services.

The paper goes on to highlight some serious shortfalls in areas such as infrastructure, public engagement, organization of research and the provision of services, before providing clear recommendations on how each of these gaps can be addressed and access to genomic services widened.

It’s clear that this is an area undergoing some pretty rapid changes, and as such will be one that demands attention in the coming years.