I’ve written before about the impressive strides being made in the use of big data in medical research. To an extent, the biggest challenge is getting hold of the data, but that isn’t to say that processing challenges don’t exist.
That was the topic of a recent paper from researchers at the Technical University of Munich, and they have developed a new tool through the course of their research to help scan biological databases for bacterial sequences.
The hunt for bacteria
Microbial communities play a crucial role in the global ecosystem, both in climate terms and also the biological processes of humans and other animals.
The last few years have seen the volume of biological data available to us mushroom, especially as DNA sequencing has become faster and cheaper. Central to understanding the role of bacteria is the 16S rRNA genes, as they are the most commonly used way of identifying bacteria.
The Sequence Read Archive is an open database that contains around 100,000 of these 16S rRNA gene sequence datasets, with this huge volume largely a consequence of the changes in sequencing technology available today.
“Over all these years, a tremendous amount of sequences from human environments such as the intestine or skin, but also from soils or the ocean has been accumulated”, the researchers say. “We have now created a tool which allows these databases to be searched in a relatively short amount of time in order to study the diversity and habitats of bacteria”, says Clavel—”with this tool, a scientists can conduct a query within a few hours in order to find out in which type of samples the bacterium he is interested in can be found—for example a pathogen from a hospital. This was not possible before.”
The tool, called the Integrated Microbial Next Generation Sequencing (IMNGS) aims to make the process better. The researchers put it through its paces using the intestinal bacterium Acetatifactor muris, with the outcome described in the aforementioned paper.
The tool is currently available to registered users, who can carry out a range of queries that are currently filtered by the origin of the bacterial data. They can also download entire sequences.
Next gen bioinformatics
This kind of bioinformatics is likely to play an increasingly important role in the kind of clinical diagnostics performed on a daily basis. This will only become more so as more members of the microbial communities are described. It’s a challenge the researchers are prepared for.
“Improving the quality of sequence datasets by collecting new reference sequences is a great challenge ahead”, they say. “Moreover, the quality of datasets is not yet good enough: the description of individual samples in databases is incomplete, and hence the comparison possibilities using IMNGS are currently still limited.”
As with so much in life, this endeavor will only be helped by greater collaboration, not least with the kind of clinics that will be performing searches and doing diagnoses. It’s certainly a project worth keeping an eye on though.