Evaluating the Performance of AI-Based Large Language Models in Radiation Oncology

A new study evaluates an artificial intelligence (AI)-based algorithm for autocontouring prior to radiotherapy in head and neck cancer. Manual contouring to pinpoint the area of treatment requires significant time, and an AI algorithm to enable autocontouring has been introduced. The study is published in the peer-reviewed journal AI in Precision Oncology.

Nikhil Thaker, from Capital Health and Bayta Systems, and coauthors, evaluated the performance of various LLMs, including OpenAI’s GPT-3.5-turbo, GPT-4, GPT-4-turbo, Meta’s Llama-2 models, and Google’s PaLM-2-text-bison.The LLMs were given an exam comprised of 300 questions, and the answers were compared to Radiation Oncology trainee performance.

The results showed that OpenAI’s GPT-4-turbo had the best performance, with 74.2% correct answers, and all three Llama-2 models under-performed. The LLMs tended to excel in the area of statistics, but to underperform in clinical areas, with the exception of GPT-turbo, which performed comparably to upper-level radiation oncology trainees and superiorly to lower-level trainees.

"Future research will need to evaluate the performance of models that are fine-tune trained in clinical oncology," concluded the investigators. "This study also underscores the need for rigorous validation of LLM-generated information against established medical literature and expert consensus, necessitating expert oversight in their application in medical education and practice."

"The study highlights the potential of generative AI to revolutionize radiation oncology education and practice. OpenAI's GPT-4-turbo demonstrates that AI can complement medical training, suggesting a future where AI aids in improving patient outcomes. It's essential, though, to validate these technologies rigorously and involve experts to ensure their reliable and effective use in healthcare," says Douglas Flora, MD, Editor-in-Chief of AI in Precision Oncology.

Nikhil G. Thaker, Navid Redjal, Arturo Loaiza-Bonilla, David Penberthy, Tim Showalter, Ajay Choudhri, Shirnett Williamson, Gautam Thaker, Chirag Shah, Matthew C. Ward, Mihir Thaker, Michael Arcaro.
Large Language Models Encode Radiation Oncology Domain Knowledge: Performance on the American College of Radiology Standardized Examination.
AI in Precision Oncology, 2024. doi: 10.1089/aipo.2023.0007

Most Popular Now

Most Advanced Artificial Touch for Brain…

For the first time ever, a complex sense of touch for individuals living with spinal cord injuries is a step closer to reality. A new study published in Science, paves...

Predicting the Progression of Autoimmune…

Autoimmune diseases, where the immune system mistakenly attacks the body's own healthy cells and tissues, often have a preclinical stage before diagnosis that’s characterized by mild symptoms or certain antibodies...

Major EU Project to Investigate Societal…

A new €3 million EU research project led by University College Dublin (UCD) Centre for Digital Policy will explore the benefits and risks of Artificial Intelligence (AI) from a societal...

Using AI to Uncover Hospital Patients�…

Across the United States, no hospital is the same. Equipment, staffing, technical capabilities, and patient populations can all differ. So, while the profiles developed for people with common conditions may...

New AI Tool Uses Routine Blood Tests to …

Doctors around the world may soon have access to a new tool that could better predict whether individual cancer patients will benefit from immune checkpoint inhibitors - a type of...

New Method Tracks the 'Learning Cur…

Introducing Annotatability - a powerful new framework to address a major challenge in biological research by examining how artificial neural networks learn to label genomic data. Genomic datasets often contain...

Picking the Right Doctor? AI could Help

Years ago, as she sat in waiting rooms, Maytal Saar-Tsechansky began to wonder how people chose a good doctor when they had no way of knowing a doctor's track record...

From Text to Structured Information Secu…

Artificial intelligence (AI) and above all large language models (LLMs), which also form the basis for ChatGPT, are increasingly in demand in hospitals. However, patient data must always be protected...

AI Innovation Unlocks Non-Surgical Way t…

Researchers have developed an artificial intelligence (AI) model to detect the spread of metastatic brain cancer using MRI scans, offering insights into patients’ cancer without aggressive surgery. The proof-of-concept study, co-led...

Deep Learning Model Helps Detect Lung Tu…

A new deep learning model shows promise in detecting and segmenting lung tumors, according to a study published in Radiology, a journal of the Radiological Society of North America (RSNA)...

New Study Reveals AI's Transformati…

Intensive care units (ICUs) face mounting pressure to effectively manage resources while delivering optimal patient care. Groundbreaking research published in the INFORMS journal Information Systems Research highlights how a novel...

One of the Largest Global Surveys of Soc…

As leaders gather for the World Economic Forum Annual Meeting 2025 in Davos, Leaps by Bayer, the impact investing arm of Bayer, and Boston Consulting Group (BCG) announced the launch...