ChatGPT Shows Limited Ability to Recommend Guidelines-Based Cancer Treatments

For many patients, the internet serves as a powerful tool for self-education on medical topics. With ChatGPT now at patients’ fingertips, researchers from Brigham and Women’s Hospital, a founding member of the Mass General Brigham healthcare system, assessed how consistently the artificial intelligence chatbot provides recommendations for cancer treatment that align with National Comprehensive Cancer Network (NCCN) guidelines. Their findings, published in JAMA Oncology, show that in approximately one-third of cases, ChatGPT 3.5 provided an inappropriate (“non-concordant”) recommendation, highlighting the need for awareness of the technology’s limitations.

"Patients should feel empowered to educate themselves about their medical conditions, but they should always discuss with a clinician, and resources on the Internet should not be consulted in isolation," said corresponding author Danielle Bitterman, MD, of the Department of Radiation Oncology and the Artificial Intelligence in Medicine (AIM) Program of Mass General Brigham. "ChatGPT responses can sound a lot like a human and can be quite convincing. But, when it comes to clinical decision-making, there are so many subtleties for every patient’s unique situation. A right answer can be very nuanced, and not necessarily something ChatGPT or another large language model can provide."

The emergence of artificial intelligence tools in health has been groundbreaking and has the potential to positively reshape the continuum of care. Mass General Brigham, as one of the nation’s top integrated academic health systems and largest innovation enterprises, is leading the way in conducting rigorous research on new and emerging technologies to inform the responsible incorporation of AI into care delivery, workforce support, and administrative processes.

Although medical decision-making can be influenced by many factors, Bitterman and colleagues chose to evaluate the extent to which ChatGPT's recommendations aligned with the NCCN guidelines, which are used by physicians at institutions across the country. They focused on the three most common cancers (breast, prostate and lung cancer) and prompted ChatGPT to provide a treatment approach for each cancer based on the severity of the disease. In total, the researchers included 26 unique diagnosis descriptions and used four, slightly different prompts to ask ChatGPT to provide a treatment approach, generating a total of 104 prompts.

Nearly all responses (98 percent) included at least one treatment approach that agreed with NCCN guidelines. However, the researchers found that 34 percent of these responses also included one or more non-concordant recommendations, which were sometimes difficult to detect amidst otherwise sound guidance. A non-concordant treatment recommendation was defined as one that was only partially correct; for example, for a locally advanced breast cancer, a recommendation of surgery alone, without mention of another therapy modality. Notably, complete agreement in scoring only occurred in 62 percent of cases, underscoring both the complexity of the NCCN guidelines themselves and the extent to which ChatGPT's output could be vague or difficult to interpret.

In 12.5 percent of cases, ChatGPT produced “hallucinations,” or a treatment recommendation entirely absent from NCCN guidelines. These included recommendations of novel therapies, or curative therapies for non-curative cancers. The authors emphasized that this form of misinformation can incorrectly set patients’ expectations about treatment and potentially impact the clinician-patient relationship.

Going forward, the researchers are exploring how well both patients and clinicians can distinguish between medical advice written by a clinician versus a large language model (LLM) like ChatGPT. They are also prompting ChatGPT with more detailed clinical cases to further evaluate its clinical knowledge.

The authors used GPT-3.5-turbo-0301, one of the largest models available at the time they conducted the study and the model class that is currently used in the open-access version of ChatGPT (a newer version, GPT-4, is only available with the paid subscription). They also used the 2021 NCCN guidelines, because GPT-3.5-turbo-0301 was developed using data up to September 2021. While results may vary if other LLMs and/or clinical guidelines are used, the researchers emphasize that many LLMs are similar in the way they are built and the limitations they possess.

"It is an open research question as to the extent LLMs provide consistent logical responses as oftentimes 'hallucinations' are observed," said first author Shan Chen, MS, of the AIM Program. "Users are likely to seek answers from the LLMs to educate themselves on health-related topics - similarly to how Google searches have been used. At the same time, we need to raise awareness that LLMs are not the equivalent of trained medical professionals."

Chen S, Kann BH, Foote MB, Aerts HJWL, Savova GK, Mak RH, Bitterman DS.
Use of Artificial Intelligence Chatbots for Cancer Treatment Information.
JAMA Oncol. 2023 Aug 24:e232954. doi: 10.1001/jamaoncol.2023.2954

Most Popular Now

Philips and Medtronic Advocacy Partnersh…

Royal Philips (NYSE: PHG, AEX: PHIA), a global leader in health technology, and Medtronic Neurovascular, a leading innovator in neurovascular therapies, today announced a strategic advocacy partnership. Delivering timely stroke...

Wearable Cameras Allow AI to Detect Medi…

A team of researchers says it has developed the first wearable camera system that, with the help of artificial intelligence (AI), detects potential errors in medication delivery. In a test whose...

New AI Tool Predicts Protein-Protein Int…

Scientists from Cleveland Clinic and Cornell University have designed a publicly-available software and web database to break down barriers to identifying key protein-protein interactions to treat with medication. The computational tool...

AI for Real-Rime, Patient-Focused Insigh…

A picture may be worth a thousand words, but still... they both have a lot of work to do to catch up to BiomedGPT. Covered recently in the prestigious journal Nature...

New Research Shows Promise and Limitatio…

Published in JAMA Network Open, a collaborative team of researchers from the University of Minnesota Medical School, Stanford University, Beth Israel Deaconess Medical Center and the University of Virginia studied...

G-Cloud 14 Makes it Easier for NHS to Bu…

NHS organisations will be able to save valuable time and resource in the procurement of technologies that can make a significant difference to patient experience, in the latest iteration of...

Start-Ups will Once Again Have a Starrin…

11 - 14 November 2024, Düsseldorf, Germany. The finalists in the 16th Healthcare Innovation World Cup and the 13th MEDICA START-UP COMPETITION have advanced from around 550 candidates based in 62...

Hampshire Emergency Departments Digitise…

Emergency departments in three hospitals across Hampshire Hospitals NHS Foundation Trust have deployed Alcidion's Miya Emergency, digitising paper processes, saving clinical teams time, automating tasks, and providing trust-wide visibility of...

MEDICA HEALTH IT FORUM: Success in Maste…

11 - 14 November 2024, Düsseldorf, Germany. How can innovations help to master the great challenges and demands with which healthcare is confronted across international borders? This central question will be...

A "Chemical ChatGPT" for New M…

Researchers from the University of Bonn have trained an AI process to predict potential active ingredients with special properties. Therefore, they derived a chemical language model - a kind of...

Siemens Healthineers co-leads EU Project…

Siemens Healthineers is joining forces with more than 20 industry and public partners, including seven leading stroke hospitals, to improve stroke management for patients all over Europe. With a total...

MEDICA and COMPAMED 2024: Shining a Ligh…

11 - 14 November 2024, Düsseldorf, Germany. Christian Grosser, Director Health & Medical Technologies, is looking forward to events getting under way: "From next Monday to Thursday, we will once again...