AI can Predict Study Results Better than Human Experts

Large language models, a type of AI that analyses text, can predict the results of proposed neuroscience studies more accurately than human experts, finds a new study led by UCL (University College London) researchers.

The findings, published in Nature Human Behaviour, demonstrate that large language models (LLMs) trained on vast datasets of text can distil patterns from scientific literature, enabling them to forecast scientific outcomes with superhuman accuracy.

The researchers say this highlights their potential as powerful tools for accelerating research, going far beyond just knowledge retrieval.

Lead author Dr Ken Luo (UCL Psychology & Language Sciences) said: “Since the advent of generative AI like ChatGPT, much research has focused on LLMs' question-answering capabilities, showcasing their remarkable skill in summarising knowledge from extensive training data. However, rather than emphasising their backward-looking ability to retrieve past information, we explored whether LLMs could synthesise knowledge to predict future outcomes.

"Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature. Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments."

The international research team began their study by developing BrainBench, a tool to evaluate how well large language models (LLMs) can predict neuroscience results.

BrainBench consists of numerous pairs of neuroscience study abstracts. In each pair, one version is a real study abstract that briefly describes the background of the research, the methods used, and the study results. In the other version, the background and methods are the same, but the results have been modified by experts in the relevant neuroscience domain to a plausible but incorrect outcome.

The researchers tested 15 different general-purpose LLMs and 171 human neuroscience experts (who had all passed a screening test to confirm their expertise) to see whether the AI or the person could correctly determine which of the two paired abstracts was the real one with the actual study results.

All of the LLMs outperformed the neuroscientists, with the LLMs averaging 81% accuracy and the humans averaging 63% accuracy. Even when the study team restricted the human responses to only those with the highest degree of expertise for a given domain of neuroscience (based on self-reported expertise), the accuracy of the neuroscientists still fell short of the LLMs, at 66%. Additionally, the researchers found that when LLMs were more confident in their decisions, they were more likely to be correct.* The researchers say this finding paves the way for a future where human experts could collaborate with well-calibrated models.

The researchers then adapted an existing LLM (a version of Mistral, an open-source LLM) by training it on neuroscience literature specifically. The new LLM specialising in neuroscience, which they dubbed BrainGPT, was even better at predicting study results, attaining 86% accuracy (an improvement on the general-purpose version of Mistral, which was 83% accurate).

Senior author Professor Bradley Love (UCL Psychology & Language Sciences) said: “In light of our results, we suspect it won’t be long before scientists are using AI tools to design the most effective experiment for their question. While our study focused on neuroscience, our approach was universal and should successfully apply across all of science.

"What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel, but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory."

Dr Luo added: "Building on our results, we are developing AI tools to assist researchers. We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes. This would enable faster iteration and more informed decision-making in experiment design."

The study was supported by the Economic and Social Research Council (ESRC), Microsoft, and a Royal Society Wolfson Fellowship, and involved researchers in UCL, University of Cambridge, University of Oxford, Max Planck Institute for Neurobiology of Behavior (Germany), Bilkent University (Turkey) and other institutions in the UK, US, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain and Australia.

Luo X, Rechardt A, Sun G, Nejad KK, Yáñez F, Yilmaz B, Lee K, Cohen AO, Borghesani V, Pashkov A, Marinazzo D, Nicholas J, Salatiello A, Sucholutsky I, Minervini P, Razavi S, Rocca R, Yusifov E, Okalova T, Gu N, Ferianc M, Khona M, Patil KR, Lee PS, Mata R, Myers NE, Bizley JK, Musslick S, Bilgin IP, Niso G, Ales JM, Gaebler M, Ratan Murty NA, Loued-Khenissi L, Behler A, Hall CM, Dafflon J, Bao SD, Love BC.
Large language models surpass human experts in predicting neuroscience results.
Nat Hum Behav. 2024 Nov 27. doi: 10.1038/s41562-024-02046-9

* When presented with two abstracts, the LLM computes the likelihood of each, assigning a perplexity score to represent how surprising each is based on its own learned knowledge as well as the context (background and method). The researchers assessed LLMs' confidence by measuring the difference in how surprising/perplexing the models found real versus fake abstracts - the greater this difference, the greater the confidence, which correlated with a higher likelihood the LLM had picked the correct abstract.

Most Popular Now

AI for Real-Rime, Patient-Focused Insigh…

A picture may be worth a thousand words, but still... they both have a lot of work to do to catch up to BiomedGPT. Covered recently in the prestigious journal Nature...

A "Chemical ChatGPT" for New M…

Researchers from the University of Bonn have trained an AI process to predict potential active ingredients with special properties. Therefore, they derived a chemical language model - a kind of...

Siemens Healthineers co-leads EU Project…

Siemens Healthineers is joining forces with more than 20 industry and public partners, including seven leading stroke hospitals, to improve stroke management for patients all over Europe. With a total...

In 10 Seconds, an AI Model Detects Cance…

Researchers have developed an AI powered model that - in 10 seconds - can determine during surgery if any part of a cancerous brain tumor that could be removed remains...

Does AI Improve Doctors' Diagnoses?

With hospitals already deploying artificial intelligence to improve patient care, a new study has found that using Chat GPT Plus does not significantly improve the accuracy of doctors' diagnoses when...

AI Analysis of PET/CT Images can Predict…

Dr. Watanabe and his teams from Niigata University have revealed that PET/CT image analysis using artificial intelligence (AI) can predict the occurrence of interstitial lung disease, known as a serious...

New Medical AI Tool Identifies more Case…

Investigators at Mass General Brigham have developed an AI-based tool to sift through electronic health records to help clinicians identify cases of long COVID, an often mysterious condition that can...

MEDICA and COMPAMED 2024: Shining a Ligh…

11 - 14 November 2024, Düsseldorf, Germany. Christian Grosser, Director Health & Medical Technologies, is looking forward to events getting under way: "From next Monday to Thursday, we will once again...

Jane Stephenson Joins SPARK TSL as Chief…

Jane Stephenson has joined SPARK TSL as chief executive as the company looks to establish the benefits of SPARK Fusion with trusts looking for deployable solutions to improve productivity. Stephenson joins...

500 Patient Images per Second Shared thr…

The image exchange portal, widely known in the NHS as the IEP, is now being used to share as many as 500 images each second - including x-rays, CT, MRI...

NIH-Developed AI Algorithm Successfully …

Researchers from the National Institutes of Health (NIH) have developed an artificial intelligence (AI) algorithm to help speed up the process of matching potential volunteers to relevant clinical research trials...

MEDICA 2024 and COMPAMED 2024: Medical T…

11 - 14 November 2024, Düsseldorf, Germany. "Meet Health. Future. People." is MEDICA's campaign motto for the future in the new trade fair year 2025. The aptness of the motto...