Almost All Leading AI Chatbots Show Signs of Cognitive Decline

Almost all leading large language models or "chatbots" show signs of mild cognitive impairment in tests widely used to spot early signs of dementia, finds a study in the Christmas issue of The BMJ.

The results also show that "older" versions of chatbots, like older patients, tend to perform worse on the tests. The authors say these findings "challenge the assumption that artificial intelligence will soon replace human doctors."

Huge advances in the field of artificial intelligence have led to a flurry of excited and fearful speculation as to whether chatbots can surpass human physicians.

Several studies have shown large language models (LLMs) to be remarkably adept at a range of medical diagnostic tasks, but their susceptibility to human impairments such as cognitive decline have not yet been examined.

To fill this knowledge gap, researchers assessed the cognitive abilities of the leading, publicly available LLMs - ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 "Sonnet" (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet) - using the Montreal Cognitive Assessment (MoCA) test.

The MoCA test is widely used to detect cognitive impairment and early signs of dementia, usually in older adults. Through a number of short tasks and questions, it assesses abilities including attention, memory, language, visuospatial skills, and executive functions. The maximum score is 30 points, with a score of 26 or above generally considered normal.

The instructions given to the LLMs for each task were the same as those given to human patients. Scoring followed official guidelines and was evaluated by a practising neurologist.

ChatGPT 4o achieved the highest score on the MoCA test (26 out of 30), followed by ChatGPT 4 and Claude (25 out of 30), with Gemini 1.0 scoring lowest (16 out of 30).

All chatbots showed poor performance in visuospatial skills and executive tasks, such as the trail making task (connecting encircled numbers and letters in ascending order) and the clock drawing test (drawing a clock face showing a specific time). Gemini models failed at the delayed recall task (remembering a five word sequence).

Most other tasks, including naming, attention, language, and abstraction were performed well by all chatbots.

But in further visuospatial tests, chatbots were unable to show empathy or accurately interpret complex visual scenes. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test, which uses combinations of colour names and font colours to measure how interference affects reaction time.

These are observational findings and the authors acknowledge the essential differences between the human brain and large language models.

However, they point out that the uniform failure of all large language models in tasks requiring visual abstraction and executive function highlights a significant area of weakness that could impede their use in clinical settings.

As such, they conclude: "Not only are neurologists unlikely to be replaced by large language models any time soon, but our findings suggest that they may soon find themselves treating new, virtual patients - artificial intelligence models presenting with cognitive impairment."

Dayan R, Uliel B, Koplewitz G.
Age against the machine-susceptibility of large language models to cognitive impairment: cross sectional analysis.
BMJ. 2024 Dec 19;387:e081948. doi: 10.1136/bmj-2024-081948

Most Popular Now

Is Your Marketing Effective for an NHS C…

How can you make sure you get the right message across to an NHS chief information officer, or chief nursing information officer? Replay this webinar with Professor Natasha Phillips, former...

We could Soon Use AI to Detect Brain Tum…

A new paper in Biology Methods and Protocols, published by Oxford University Press, shows that scientists can train artificial intelligence (AI) models to distinguish brain tumors from healthy tissue. AI...

Welcome Evo, Generative AI for the Genom…

Brian Hie runs the Laboratory of Evolutionary Design at Stanford, where he works at the crossroads of artificial intelligence and biology. Not long ago, Hie pondered a provocative question: If...

Telehealth Significantly Boosts Treatmen…

New research reveals a dramatic improvement in diagnosing and curing people living with hepatitis C in rural communities using both telemedicine and support from peers with lived experience in drug...

AI can Predict Study Results Better than…

Large language models, a type of AI that analyses text, can predict the results of proposed neuroscience studies more accurately than human experts, finds a new study led by UCL...

Using AI to Treat Infections more Accura…

New research from the Centres for Antimicrobial Optimisation Network (CAMO-Net) at the University of Liverpool has shown that using artificial intelligence (AI) can improve how we treat urinary tract infections...

Research Study Shows the Cost-Effectiven…

Earlier research showed that primary care clinicians using AI-ECG tools identified more unknown cases of a weak heart pump, also called low ejection fraction, than without AI. New study findings...

New Guidance for Ensuring AI Safety in C…

As artificial intelligence (AI) becomes more prevalent in health care, organizations and clinicians must take steps to ensure its safe implementation and use in real-world clinical settings, according to an...

Remote Telemedicine Tool Found Highly Ac…

Collecting images of suspicious-looking skin growths and sending them off-site for specialists to analyze is as accurate in identifying skin cancers as having a dermatologist examine them in person, a...

Philips Aims to Advance Cardiac MRI Tech…

Royal Philips (NYSE: PHG, AEX: PHIA) and Mayo Clinic announced a research collaboration aimed at advancing MRI for cardiac applications. Through this investigation, Philips and Mayo Clinic will look to...

Deep Learning Model Accurately Diagnoses…

Using just one inhalation lung CT scan, a deep learning model can accurately diagnose and stage chronic obstructive pulmonary disease (COPD), according to a study published today in Radiology: Cardiothoracic...

Shape-Changing Device Helps Visually Imp…

Researchers from Imperial College London, working with the company MakeSense Technology and the charity Bravo Victor, have developed a shape-changing device called Shape that helps people with visual impairment navigate...