Medical AI Systems Struggle To Perform Well Across IT Systems

The level of expectation surrounding AI in healthcare has reached fever pitch in recent years, with a number of pilot projects achieving positive early results.  Most of these projects involved AI systems being trained on a sample dataset of medical data, such as x-rays or other medical imagery, after which the system was capable of providing early detection of various conditions.

The challenge for many of these systems is that they were usually trained on data from a single healthcare provider, with a common health IT system.  A recent study highlights how when faced with data from different health systems, such AI technologies often perform much worse than doctors.

The research, conducted by the Icahn School of Medicine at Mount Sinai, used convolutional neural networks (CNNs) to analyze a bunch of chest X-ray images with the aim of diagnosing pneumonia.  The system operated across three hospital systems in an attempt to simulate pneumonia screening.

Lacklustre performance

Unfortunately, in three out of the five comparisons undertaken, the CNN performed worse on X-rays from hospitals outside of its own network than it did on X-rays from its origina network.  The researchers believe this is largely because deep learning models utilize too many parameters, which in turn makes it hard to identify the particular variable that is driving the predictions it makes.  This makes it hard to return consistent performance across multiple systems.

“Our findings should give pause to those considering rapid deployment of artificial intelligence platforms without rigorously assessing their performance in real-world clinical setting reflective of where they are being deployed,” the researchers say.

As such, the authors recommend that CNN systems are tailored to specific clinical questions and tested in a wider range of real-world scenarios than is often the case today.  If this doesn’t take place, then it’s quite probable that their performance in test scenarios significantly overshoots their potential real-world performance.

The team believe that before such systems are cut loose in the wild, they need to be able to generalize across a wide range of hospital systems accurately and effectively.  If they can’t, then accuracy levels are likely to be much lower than required.  It’s a finding that underlines the work still required before the AI systems that have generated so much hype and expectation are cut loose.

Facebooktwitterredditpinterestlinkedinmail