The potential of increasingly powerful artificial intelligence to analyze increasingly available genomic data is significant. A good example of where we’re currently at comes via a recent study published by the New York Genome Center (NYGC), The Rockefeller University and other NYGC member institutions, and IBM.
The study compared a number of techniques that are commonly used to analyze genomic data from tumor cells and healthy cells. The study utilized Watson for Genomics technology to help interpret the genome data. The paper reveals that Watson was able to provide actionable insights in just 10 minute, which compares to approximately 160 hours of human analysis.
It seems that we’re at something of a tipping point with regards to genomic data, with sequencing costs coming down to $1,000, and some companies even offering to sequence your genome for free in return for that data being donated to science. As such, the amount of genomic data researchers have to play with is likely to balloon significantly in the coming years. The challenge will then move towards making sense of the data.
It’s an informatics challenge that often forms a bottleneck in our approach to diseases such as cancer, and it’s a bottleneck that the team believe AI can help to overcome.
“Our partnership has explored cutting-edge challenges and opportunities in harnessing genomics to help cancer patients. We provide initial insights into two critical issues: what clinical value can be extracted from different commercial and academic cancer genomic platforms, and how to think about scaling access to that value,” they say.
DNA and RNA from a glioblastoma tumor specimen was analyzed alongside the patient’s normal blood. They were compared against potentially actionable insights from a commercial targeted panel that had been performed on the sample.
“This study documents the strong potential of Watson for Genomics to help clinicians scale precision oncology more broadly,” IBM say. “Clinical and research leaders in cancer genomics are making tremendous progress towards bringing precision medicine to cancer patients, but genomic data interpretation is a significant obstacle, and that’s where Watson can help.”
Big data medicine
Further evidence of the potential comes from the University of California San Francisco, where a team have developed a computational method to probe huge quantities of data to discover new ways of using drugs.
The approach would allow scientists to bypass the traditional experiments undertaken on biological specimens, and do computational analyses instead.
“This points toward a day when doctors may treat their patients with drugs that have been individually tailored to the idiosyncracies of their own disease,” the researchers say.
The researchers used data from the Cancer Genome Atlas (TCGA), which is a detailed map of the genomic changes in numerous types of cancer. It contains over two petabytes of data and allowed the team to see how genes were regulated in the cancerous tissue in comparison to the normal tissue.
They were then able to search the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset, to see how thousands of compounds and chemicals affected cancer cells. This allowed them to form a league table of some 12,442 small molecules according to their ability to reverse abnormal changes in gene expression.
Finally, they used both the ChEMBL database and the Cancer Cell Line Encyclopedia to analyze and compare molecular profiles. Eventually, the analysis revealed four distinct drugs that were most likely to be effective.
“Since in many cancers, we already have lots of known drug efficacy data, we were able to perform large-scale analyses without running any biological experiments,” the researchers say.
Suffice to say, such successes are only possible if the data is of a high standard. A recent paper from Oregon Health & Science University highlights the importance of good design of digital identifiers used to tag data in such big data led research projects.
The paper identifies some pragmatic guidelines to help create, reference and maintain web-based identifiers to improve reproducibility, attribution, and scientific discovery.
The data that researchers use for their work is only as useful as the robustness and uniqueness of their respective identifiers, which allow for each record to be linked and discovered. Most of the time these identifiers emerge organically, with the researchers believing this lassez faire approach threatens the usefulness of research.
As such, they urge the research community to do a better job of engineering the identifiers according to set standards and conventions.
“As with plumbing fixtures, the question of how identifiers work should only need to be understood by those that build and maintain them. However, everyone needs to know how identifiers should be used, and this is where convention is important,” they say. “Through this work, we hope to encourage all participants in the scholarly ecosystem – including authors, data creators, data integrators, publishers, software developers, and resolvers – to adhere to best practice in order to maximize the utility and impact of life science data.”