A study led by the University of Cambridge has revealed that GPT-4, a sophisticated language model, possesses clinical knowledge and reasoning abilities akin to those of specialized eye doctors.
In the study, GPT-4 was pitted against doctors at varying career stages, including junior doctors without specialization, as well as trainee and expert eye doctors. Each participant was tasked with diagnosing and recommending treatments for 87 patient scenarios concerning specific eye conditions.
Outperforming experts
Remarkably, GPT-4 outperformed unspecialized junior doctors, who possess similar levels of eye-related expertise to general practitioners. Furthermore, GPT-4 achieved comparable scores to both trainee and expert eye doctors, though the top-performing human doctors still surpassed its performance.
While the researchers emphasize that large language models like GPT-4 are unlikely to replace healthcare professionals, they suggest that these models could enhance healthcare delivery as part of the clinical process. Specifically, they propose that cutting-edge language models such as GPT-4 could offer valuable insights, diagnoses, and management recommendations in controlled settings, such as patient triaging, or in situations where access to specialist healthcare providers is limited.
“We could realistically deploy AI in triaging patients with eye issues to decide which cases are emergencies that need to be seen by a specialist immediately, which can be seen by a GP, and which don’t need treatment,” the researchers explain.
“The models could follow clear algorithms already in use, and we’ve found that GPT-4 is as good as expert clinicians at processing eye symptoms and signs to answer more complicated questions.”
Refining the system
The refinement and advancement of these models require extensive collections of clinical text, and efforts are underway globally to facilitate this process.
Researchers highlight the superiority of their study over previous ones by directly comparing AI capabilities to those of practicing doctors, rather than relying solely on sets of examination results.
“Doctors aren’t revising for exams for their whole career. We wanted to see how AI fared when pitted against the on-the-spot knowledge and abilities of practicing doctors, to provide a fair comparison,” the researchers say. “We also need to characterize the capabilities and limitations of commercially available models, as patients may already be using them – rather than the internet – for advice.”
The examination encompassed inquiries covering a wide spectrum of eye-related issues, such as heightened light sensitivity, vision impairment, lesions, and discomfort like itchiness and pain. These questions were sourced from a textbook commonly employed to assess trainee eye doctors. Notably, this textbook is not readily accessible online, suggesting that its content was unlikely to have been incorporated into GPT-4’s training datasets.
“Even taking the future use of AI into account, I think doctors will continue to be in charge of patient care. The most important thing is to empower patients to decide whether they want computer systems to be involved or not. That will be an individual decision for each patient to make,” the researchers conclude.