It seems increasingly the case that healthcare is becoming a big data problem. There have been projects such as the UK’s Biobank to encourage data sharing, and the White House recently launched it’s Cancer Moonshot initiative to screen around 250,000 samples to hunt for biomarkers that are signs of cancer.
As part of the project, Berg Healthcare have been recruited to apply their AI skills to the task. They will have access to the Clinical Breast Care Project, which houses 13,600 samples of healthy and diseased breast tissue.
This work will help to generate genomic insights into the way cancerous and healthy cells differ, whether in terms of their cellular processes, proteins or mutations. When combined with the medical history of each patient, it is hoped that this will allow Berg to train their algorithms to spot healthy and diseased tissues via new biomarkers.
Healthcare as a data problem
Another example of this in action is a recent paper from researchers at the University of California San Diego, who integrated multiple data sets to discover new biological patterns in cellular processes.
Central to this approach is obtaining large quantities of so called omic data, which consists of everything from genes to proteins, RNA profiles to metabolites. This allows a more holistic perspective on cellular processes, but managing the data is hugely challenging.
“When doing big data analysis, it is important to know how all these different data types are related. Now we have a way of connecting multiple different data types to generate fundamental answers to biological questions,” the authors say.
“While all these data types are derived from the same cell, they represent processes occurring at very different scales. Our work is about getting multiple different data types synchronized so that we can understand the coordination of these processes and derive meaning from them,” they continue.
The researchers examined the relationship between omic data types, and their analysis uncovered a number of new regularities in areas such as protein translation. They believe that their discovery will play a significant role in better understanding the functioning of the protein, which in turn will be valuable in studying the biology of cancer.
“Now we have a fundamental explanation for these pause sites that we didn’t have before. It’s as if we’re witnessing an intricate dance with a certain rhythm to make sure that a protein is formed the right way,” the authors say.
The project is part of a much wider one run by the National Institutes of Health, called Big Data to Knowledge. The aim of the project is to support the transition of huge biological data sets into workable information.
As biological data becomes richer, and we develop an increasing array of tools for analyzing this data, it heralds an extremely promising time for the industry. It’s an area I’ll be watching with great interest.