Ordinarily, AI systems require vast quantities of data to train on, but it’s increasingly possible to train algorithms on data that is largely fake. In many domains, this isn’t really necessary as data is so abundant, but in healthcare, this isn’t always the case, especially for very rare conditions.
A recent paper published by NVIDIA, MGH & BWH Center for Clinical Data Science and the Mayo Clinic explores how generative adversarial networks (GANs) can help to overcome a lack of data by artificially generating it.
The system began by gathering a couple of open data sets containing MRI scans of the brain. One of these contained brains with Alzheimer’s, whilst the other showed brains with tumors. These were used to train the system to be able to generate its own images.
The system was then able to do just that via a mixture of 10% real data and 90% GAN-generated data, with the training provided to the algorithm as effective as when it was trained on real world data.
Knowing the limits
So is this the solution to any scenario where data is limited? Not really. It isn’t possible for a GAN to create data it hasn’t previously seen before, so if there are new details in there, it could corrupt the whole process. This is often the case with tumor images, as the biology is still not entirely understood.
It is a degree of progress nonetheless, and the technology can accurately generate pictures with tumors that are either larger or smaller than normal, or indeed move them to other parts of the brain.
It’s certainly an interesting technology, and it’s nice to see if being put to slightly more productive use than the fake media images it has largely been used for to date. A previous study has shown that GANs can also produce realistic images of skin legions, whilst a third performed a similar trick with liver lesions.
It’s perhaps fair to say that there is a little way to go before such technology is ready for use in the wild, but it’s a clear sign of the progress being made.