Almost All Leading AI Chatbots Show Signs of Cognitive Decline

Almost all leading large language models or "chatbots" show signs of mild cognitive impairment in tests widely used to spot early signs of dementia, finds a study in the Christmas issue of The BMJ.

The results also show that "older" versions of chatbots, like older patients, tend to perform worse on the tests. The authors say these findings "challenge the assumption that artificial intelligence will soon replace human doctors."

Huge advances in the field of artificial intelligence have led to a flurry of excited and fearful speculation as to whether chatbots can surpass human physicians.

Several studies have shown large language models (LLMs) to be remarkably adept at a range of medical diagnostic tasks, but their susceptibility to human impairments such as cognitive decline have not yet been examined.

To fill this knowledge gap, researchers assessed the cognitive abilities of the leading, publicly available LLMs - ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 "Sonnet" (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet) - using the Montreal Cognitive Assessment (MoCA) test.

The MoCA test is widely used to detect cognitive impairment and early signs of dementia, usually in older adults. Through a number of short tasks and questions, it assesses abilities including attention, memory, language, visuospatial skills, and executive functions. The maximum score is 30 points, with a score of 26 or above generally considered normal.

The instructions given to the LLMs for each task were the same as those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist.

ChatGPT 4o achieved the highest score on the MoCA test (26 out of 30), followed by ChatGPT 4 and Claude (25 out of 30), with Gemini 1.0 scoring lowest (16 out of 30).

All chatbots showed poor performance in visuospatial skills and executive tasks, such as the trail making task (connecting encircled numbers and letters in ascending order) and the clock drawing test (drawing a clock face showing a specific time). Gemini models failed at the delayed recall task (remembering a five word sequence).

Most other tasks, including naming, attention, language, and abstraction were performed well by all chatbots.

But in further visuospatial tests, chatbots were unable to show empathy or accurately interpret complex visual scenes. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test, which uses combinations of colour names and font colours to measure how interference affects reaction time.

These are observational findings and the authors acknowledge the essential differences between the human brain and large language models.

However, they point out that the uniform failure of all large language models in tasks requiring visual abstraction and executive function highlights a significant area of weakness that could impede their use in clinical settings.

As such, they conclude: "Not only are neurologists unlikely to be replaced by large language models any time soon, but our findings suggest that they may soon find themselves treating new, virtual patients - artificial intelligence models presenting with cognitive impairment."

Dayan R, Uliel B, Koplewitz G.
Age against the machine-susceptibility of large language models to cognitive impairment: cross sectional analysis.
BMJ. 2024 Dec 19;387:e081948. doi: 10.1136/bmj-2024-081948

Most Popular Now

Researchers Find Telemedicine may Help R…

Low-value care - medical tests and procedures that provide little to no benefit to patients - contributes to excess medical spending and both direct and cascading harms to patients. A...

AI Revolutionizes Glaucoma Care

Imagine walking into a supermarket, train station, or shopping mall and having your eyes screened for glaucoma within seconds - no appointment needed. With the AI-based Glaucoma Screening (AI-GS) network...

AI may Help Clinicians Personalize Treat…

Individuals with generalized anxiety disorder (GAD), a condition characterized by daily excessive worry lasting at least six months, have a high relapse rate even after receiving treatment. Artificial intelligence (AI)...

Accelerating NHS Digital Maturity: Paper…

Digitised clinical noting at South Tees Hospitals NHS Foundation Trust is creating efficiencies for busy doctors and nurses. The trust’s CCIO Dr Andrew Adair, deputy CCIO Dr John Greenaway, and...

Mobile App Tracking Blood Pressure Helps…

The AHOMKA platform, an innovative mobile app for patient-to-provider communication that developed through a collaboration between the School of Engineering and leading medical institutions in Ghana, has yielded positive results...

AI can Open Up Beds in the ICU

At the height of the COVID-19 pandemic, hospitals frequently ran short of beds in intensive care units. But even earlier, ICUs faced challenges in keeping beds available. With an aging...

Can AI Help Detect Cognitive Impairment?

Mild cognitive impairment (MCI) can be an early indicator of Alzheimer's disease or dementia, so identifying those with cognitive issues early could lead to interventions and better outcomes. But diagnosing...

Customized Smartphone App Shows Promise …

A growing body of research indicates that older adults in assisted living facilities can delay or even prevent cognitive decline through interventions that combine multiple activities, such as improving diet...

New Study Shows Promise for Gamified mHe…

A new study published in Multiple Sclerosis and Related Disorders highlights the potential of More Stamina, a gamified mobile health (mHealth) app designed to help people with Multiple Sclerosis (MS)...

AI Model Predicting Two-Year Risk of Com…

AFib (short for atrial fibrillation), a common heart rhythm disorder in adults, can have disastrous consequences including life-threatening blood clots and stroke if left undetected or untreated. A new study...

Patients' Affinity for AI Messages …

In a Duke Health-led survey, patients who were shown messages written either by artificial intelligence (AI) or human clinicians indicated a preference for responses drafted by AI over a human...

New Research Explores How AI can Build T…

In today’s economy, many workers have transitioned from manual labor toward knowledge work, a move driven primarily by technological advances, and workers in this domain face challenges around managing non-routine...