First Therapy Chatbot Trial Shows AI can Provide 'Gold-Standard' Care

Dartmouth researchers conducted the first clinical trial of a therapy chatbot powered by generative AI and found that the software resulted in significant improvements in participants' symptoms, according to results published in NEJM AI, a journal from the publishers of the New England Journal of Medicine.

People in the study also reported they could trust and communicate with the system, known as Therabot, to a degree that is comparable to working with a mental-health professional.

The trial consisted of 106 people from across the United States diagnosed with major depressive disorder, generalized anxiety disorder, or an eating disorder. Participants interacted with Therabot through a smartphone app by typing out responses to prompts about how they were feeling or initiating conversations when they needed to talk.

People diagnosed with depression experienced a 51% average reduction in symptoms, leading to clinically significant improvements in mood and overall well-being, the researchers report. Participants with generalized anxiety reported an average reduction in symptoms of 31%, with many shifting from moderate to mild anxiety, or from mild anxiety to below the clinical threshold for diagnosis.

Among those at risk for eating disorders—who are traditionally more challenging to treat - Therabot users showed a 19% average reduction in concerns about body image and weight, which significantly outpaced a control group that also was part of the trial.

The researchers conclude that while AI-powered therapy is still in critical need of clinician oversight, it has the potential to provide real-time support for the many people who lack regular or immediate access to a mental-health professional.

"The improvements in symptoms we observed were comparable to what is reported for traditional outpatient therapy, suggesting this AI-assisted approach may offer clinically meaningful benefits," says Nicholas Jacobson, the study's senior author and an associate professor of biomedical data science and psychiatry in Dartmouth's Geisel School of Medicine.

"There is no replacement for in-person care, but there are nowhere near enough providers to go around," Jacobson says. For every available provider in the United States, there's an average of 1,600 patients with depression or anxiety alone, he says.

"We would like to see generative AI help provide mental health support to the huge number of people outside the in-person care system. I see the potential for person-to-person and software-based therapy to work together," says Jacobson, who is the director of the treatment development and evaluation core at Dartmouth's Center for Technology and Behavioral Health.

Michael Heinz, the study's first author and an assistant professor of psychiatry at Dartmouth, says the trial results also underscore the critical work ahead before generative AI can be used to treat people safely and effectively.

"While these results are very promising, no generative AI agent is ready to operate fully autonomously in mental health where there is a very wide range of high-risk scenarios it might encounter," says Heinz, who also is an attending psychiatrist at Dartmouth Hitchcock Medical Center in Lebanon, N.H. "We still need to better understand and quantify the risks associated with generative AI used in mental health contexts."

Therabot has been in development in Jacobson's AI and Mental Health Lab at Dartmouth since 2019. The process included continuous consultation with psychologists and psychiatrists affiliated with Dartmouth and Dartmouth Health.

When people initiate a conversation with the app, Therabot answers with natural, open-ended text dialog based on an original training set the researchers developed from current, evidence-based best practices for psychotherapy and cognitive behavioral therapy, Heinz says.

For example, if a person with anxiety tells Therabot they have been feeling very nervous and overwhelmed lately, it might respond, "Let's take a step back and ask why you feel that way." If Therabot detects high-risk content such as suicidal ideation during a conversation with a user, it will provide a prompt to call 911, or contact a suicide prevention or crisis hotline, with the press of an onscreen button.

The clinical trial provided the participants randomly selected to use Therabot with four weeks of unlimited access. The researchers also tracked the control group of 104 people with the same diagnosed conditions who had no access to Therabot.

Almost 75% of the Therabot group were not under pharmaceutical or other therapeutic treatment at the time. The app asked about people's well-being, personalizing its questions and responses based on what it learned during its conversations with participants. The researchers evaluated conversations to ensure that the software was responding within best therapeutic practices.

After four weeks, the researchers gauged a person's progress through standardized questionnaires clinicians use to detect and monitor each condition. The team did a second assessment after another four weeks when participants could initiate conversations with Therabot but no longer received prompts.

After eight weeks, all participants using Therabot experienced a marked reduction in symptoms that exceed what clinicians consider statistically significant, Jacobson says.

These differences represent robust, real-world improvements that patients would likely notice in their daily lives, Jacobson says. Users engaged with Therabot for an average of six hours throughout the trial, or the equivalent of about eight therapy sessions, he says.

"Our results are comparable to what we would see for people with access to gold-standard cognitive therapy with outpatient providers," Jacobson says. "We're talking about potentially giving people the equivalent of the best treatment you can get in the care system over shorter periods of time."

Critically, people reported a degree of "therapeutic alliance" in line with what patients report for in-person providers, the study found. Therapeutic alliance relates to the level of trust and collaboration between a patient and their caregiver and is considered essential to successful therapy.

One indication of this bond is that people not only provided detailed responses to Therabot's prompts - they frequently initiated conversations, Jacobson says. Interactions with the software also showed upticks at times associated with unwellness, such as in the middle of the night.

"We did not expect that people would almost treat the software like a friend. It says to me that they were actually forming relationships with Therabot," Jacobson says. "My sense is that people also felt comfortable talking to a bot because it won't judge them."

The Therabot trial shows that generative AI has the potential to increase a patient's engagement and, importantly, continued use of the software, Heinz says.

"Therabot is not limited to an office and can go anywhere a patient goes. It was available around the clock for challenges that arose in daily life and could walk users through strategies to handle them in real time," Heinz says. "But the feature that allows AI to be so effective is also what confers its risk - patients can say anything to it, and it can say anything back."

The development and clinical testing of these systems need to have rigorous benchmarks for safety, efficacy, and the tone of engagement, and need to include the close supervision and involvement of mental-health experts, Heinz says.

"This trial brought into focus that the study team has to be equipped to intervene - possibly right away - if a patient expresses an acute safety concern such as suicidal ideation, or if the software responds in a way that is not in line with best practices," he says. "Thankfully, we did not see this often with Therabot, but that is always a risk with generative AI, and our study team was ready."

In evaluations of earlier versions of Therabot more than two years ago, more than 90% of responses were consistent with therapeutic best-practices, Jacobson says. That gave the team the confidence to move forward with the clinical trial.

"There are a lot of folks rushing into this space since the release of ChatGPT, and it's easy to put out a proof of concept that looks great at first glance, but the safety and efficacy is not well established," Jacobson says. "This is one of those cases where diligent oversight is needed, and providing that really sets us apart in this space."

Michael V Heinz, Daniel M Mackin, Brianna M Trudeau, Sukanya Bhattacharya, Yinzhou Wang, Haley A Banta, Abi D Jewett, Abigail J Salzhauer, Tess Z Griffin, Nicholas C Jacobson.
Randomized Trial of a Generative AI Chatbot for Mental Health Treatment.
NEJM AI, 2025. doi: 10.1056/AIoa2400802

Most Popular Now

Stepping Hill Hospital Announced as SPAR…

Stepping Hill Hospital, part of Stockport NHS Foundation Trust, has replaced its bedside units with state-of-the art devices running a full range of information, engagement, communications and productivity apps, to...

DMEA 2025: Digital Health Worldwide in B…

8 - 10 April 2025, Berlin, Germany. From the AI Act, to the potential of the European Health Data Space, to the power of patient data in Scandinavia - DMEA 2025...

Is AI in Medicine Playing Fair?

As artificial intelligence (AI) rapidly integrates into health care, a new study by researchers at the Icahn School of Medicine at Mount Sinai reveals that all generative AI models may...

AI Tool can Track Effectiveness of Multi…

A new artificial intelligence (AI) tool that can help interpret and assess how well treatments are working for patients with multiple sclerosis (MS) has been developed by UCL researchers. AI uses...

New System for the Early Detection of Au…

A team from the Human-Tech Institute-Universitat Politècnica de València has developed a new system for the early detection of Autism Spectrum Disorder (ASD) using virtual reality and artificial intelligence. The...

DMEA 2025 Ends with Record Attendance an…

8 - 10 April 2025, Berlin, Germany. DMEA 2025 came to a successful close with record attendance and an impressive program. 20,500 participants attended Europe's leading digital health event over the...

Diagnoses and Treatment Recommendations …

A new study led by Prof. Dan Zeltzer, a digital health expert from the Berglas School of Economics at Tel Aviv University, compared the quality of diagnostic and treatment recommendations...

Multi-Resistance in Bacteria Predicted b…

An AI model trained on large amounts of genetic data can predict whether bacteria will become antibiotic-resistant. The new study shows that antibiotic resistance is more easily transmitted between genetically...

AI-Driven Smart Devices to Transform Hea…

AI-powered, internet-connected medical devices have the potential to revolutionise healthcare by enabling early disease detection, real-time patient monitoring, and personalised treatments, a new study suggests. They are already saving lives...

Generative AI's Diagnostic Capabili…

The use of generative AI for diagnostics has attracted attention in the medical field and many research papers have been published on this topic. However, because the evaluation criteria were...

Surrey and Sussex Healthcare NHS Trust g…

Surrey and Sussex Healthcare NHS Trust has marked an important milestone in connecting busy radiologists across large parts of South East England, following the successful go live of Sectra's enterprise...

A Novel AI-Based Method Reveals How Cell…

Researchers from Tel Aviv University have developed an innovative method that can help to understand better how cells behave in changing biological environments, such as those found within a cancerous...