Clinicians could be Fooled by Biased AI, Despite Explanations

AI models in health care are a double-edged sword, with models improving diagnostic decisions for some demographics, but worsening decisions for others when the model has absorbed biased medical data.

Given the very real life and death risks of clinical decision-making, researchers and policymakers are taking steps to ensure AI models are safe, secure and trustworthy - and that their use will lead to improved outcomes.

The U.S. Food and Drug Administration has oversight of software powered by AI and machine learning used in health care and has issued guidance for developers. This includes a call to ensure the logic used by AI models is transparent or explainable so that clinicians can review the underlying reasoning.

However, a new study in JAMA finds that even with provided AI explanations, clinicians can be fooled by biased AI models.

"The problem is that the clinician has to understand what the explanation is communicating and the explanation itself," said first author Sarah Jabbour, a Ph.D. candidate in computer science and engineering at the College of Engineering at the University of Michigan.

The U-M team studied AI models and AI explanations in patients with acute respiratory failure.

"Determining why a patient has respiratory failure can be difficult. In our study, we found clinicians baseline diagnostic accuracy to be around 73%," said Michael Sjoding, M.D., associate professor of internal medicine at the U-M Medical School, a co-senior author on the study.

"During the normal diagnostic process, we think about a patient’s history, lab tests and imaging results, and try to synthesize this information and come up with a diagnosis. It makes sense that a model could help improve accuracy."

Jabbour, Sjoding, co-senior author, Jenna Wiens, Ph.D., associate professor of computer science and engineering and their multidisciplinary team designed a study to evaluate the diagnostic accuracy of 457 hospitalist physicians, nurse practitioners and physician assistants with and without assistance from an AI model.

Each clinician was asked to make treatment recommendations based on their diagnoses. Half were randomized to receive an AI explanation with the AI model decision, while the other half received only the AI decision with no explanation.

Clinicians were then given real clinical vignettes of patients with respiratory failure, as well as a rating from the AI model on whether the patient had pneumonia, heart failure or COPD.

In the half of participants who were randomized to see explanations, the clinician was provided a heatmap, or visual representation, of where the AI model was looking in the chest radiograph, which served as the basis for the diagnosis.

The team found that clinicians who were presented with an AI model trained to make reasonably accurate predictions, but without explanations, had their own accuracy increase by 2.9 percentage points. When provided an explanation, their accuracy increased by 4.4 percentage points.

However, to test whether an explanation could enable clinicians to recognize when an AI model is clearly biased or incorrect, the team also presented clinicians with models intentionally trained to be biased - for example, a model predicting a high likelihood of pneumonia if the patient was 80 years old or older.

"AI models are susceptible to shortcuts, or spurious correlations in the training data. Given a dataset in which women are underdiagnosed with heart failure, the model could pick up on an association between being female and being at lower risk for heart failure," explained Wiens.

"If clinicians then rely on such a model, it could amplify existing bias. If explanations could help clinicians identify incorrect model reasoning this could help mitigate the risks."

When clinicians were shown the biased AI model, however, it decreased their accuracy by 11.3 percentage points and explanations which explicitly highlighted that the AI was looking at non-relevant information (such as low bone density in patients over 80 years) did not help them recover from this serious decline in performance.

The observed decline in performance aligns with previous studies that find users may be deceived by models, noted the team.

"There's still a lot to be done to develop better explanation tools so that we can better communicate to clinicians why a model is making specific decisions in a way that they can understand. It’s going to take a lot of discussion with experts across disciplines," Jabbour said.

The team hopes this study will spur more research into the safe implementation of AI-based models in health care across all populations and for medical education around AI and bias.

Jabbour S, Fouhey D, Shepard S, et al.
Measuring the Impact of AI in the Diagnosis of Hospitalized Patients: A Randomized Clinical Vignette Survey Study.
JAMA. 2023;330(23):2275-2284. doi: 10.1001/jama.2023.22295

Most Popular Now

AI may Help Clinicians Personalize Treat…

Individuals with generalized anxiety disorder (GAD), a condition characterized by daily excessive worry lasting at least six months, have a high relapse rate even after receiving treatment. Artificial intelligence (AI)...

Mobile App Tracking Blood Pressure Helps…

The AHOMKA platform, an innovative mobile app for patient-to-provider communication that developed through a collaboration between the School of Engineering and leading medical institutions in Ghana, has yielded positive results...

Can AI Help Detect Cognitive Impairment?

Mild cognitive impairment (MCI) can be an early indicator of Alzheimer's disease or dementia, so identifying those with cognitive issues early could lead to interventions and better outcomes. But diagnosing...

Accelerating NHS Digital Maturity: Paper…

Digitised clinical noting at South Tees Hospitals NHS Foundation Trust is creating efficiencies for busy doctors and nurses. The trust’s CCIO Dr Andrew Adair, deputy CCIO Dr John Greenaway, and...

AI can Open Up Beds in the ICU

At the height of the COVID-19 pandemic, hospitals frequently ran short of beds in intensive care units. But even earlier, ICUs faced challenges in keeping beds available. With an aging...

Customized Smartphone App Shows Promise …

A growing body of research indicates that older adults in assisted living facilities can delay or even prevent cognitive decline through interventions that combine multiple activities, such as improving diet...

New Study Shows Promise for Gamified mHe…

A new study published in Multiple Sclerosis and Related Disorders highlights the potential of More Stamina, a gamified mobile health (mHealth) app designed to help people with Multiple Sclerosis (MS)...

Patients' Affinity for AI Messages …

In a Duke Health-led survey, patients who were shown messages written either by artificial intelligence (AI) or human clinicians indicated a preference for responses drafted by AI over a human...

New Research Explores How AI can Build T…

In today’s economy, many workers have transitioned from manual labor toward knowledge work, a move driven primarily by technological advances, and workers in this domain face challenges around managing non-routine...

AI Tool Helps Predict Who will Benefit f…

A study led by UCLA investigators shows that artificial intelligence (AI) could play a key role in improving treatment outcomes for men with prostate cancer by helping physicians determine who...

AI in Healthcare: How do We Get from Hyp…

The Highland Marketing advisory board met to consider the government's enthusiasm for AI. To date, healthcare has mostly experimented with decision support tools, and their impact on the NHS and...

New AI Tool Accelerates Disease Treatmen…

University of Virginia School of Medicine scientists have created a computational tool to accelerate the development of new disease treatments. The tool goes beyond current artificial intelligence (AI) approaches by...