How machine learning helps identify toxicity in potential drugs

New drugs typically take 12-14 years to make it to market, with a 2014 report finding that the average cost of getting a new drug to market had ballooned to a whopping $2.6 billion.

It’s a topic I’ve covered before, with a study published earlier this year highlighting how automation could be used to reduce the cost of drug discovery by approximately 70%.

Or you have the team from the University of Toronto, who in a recent paper describe the use of machine learning to generate 3D structures of protein molecules to assist with drug development.

“Designing successful drugs is like solving a puzzle,” the team say. “Without knowing the three-dimensional shape of a protein, it would be like trying to solve that puzzle with a blindfold on.”

The team believe that being able to determine the atomic structure of protein molecules will play a huge role in understanding how they work, and how they may respond to drug therapies. The drugs typically work by binding to a protein molecule, and then changing its shape and thus altering how it works.

Another example of what’s possible is provided by a recent study published in Cell Chemical Biology. The study reveals a big data based approach to detecting toxic side effects that would prohibit a drug from being used on humans before it gets to the expensive clinical trial stage.

Working with little data

Most of these approaches rely on huge datasets to derive their insights, but a recent Stanford University study highlights the potential of AI, even with relatively small amounts of data to play with.

The team use a new kind of deep learning known as one-shot learning that can do its stuff with relatively few data points.

“We’re trying to use machine learning, especially deep learning, for the early stage of drug design,” the team say. “The issue is, once you have thousands of examples in drug design, you probably already have a successful drug.”

A considerable problem to overcome is the data properties of small molecules. In order to make such data more digestible, the team represented each molecule via the connections that existed between atoms. This process helped them identify a number of properties that could then be analyzed via the algorithm.

The algorithm was trained on two distinct datasets, one on the toxicity of various chemicals, with the other on known side effects from approved medicines. From both datasets, the algorithm was able to predict the toxicity of the medicine with reasonable accuracy.

“We worked on some prototype algorithms and found that, given a few data points, they were able to make predictions that were pretty accurate,” the team say.

On the shoulders of giants

The work is the latest in a series of iterative improvements in one-shot learning. It builds on what has gone before by relying on the closeness of different molecules.

The team believe that their algorithm could be a useful aid to chemists aiming to choose the right molecule to conduct further analysis of in their studies.

“Right now, people make this kind of choice by hunch,” the team say. “This might be a nice compliment to that: an experimentalist’s helper.”

It’s the first time that the one-shot approach has been applied in this way, so is certainly interesting. Suffice to say, it’s very much the start of the journey rather than the end, and it will be fascinating to monitor where it goes from here.

The code itself is openly available via the DeepChem library, so hopefully others will build on it, just as they have built on the work of others.