Teaching AI to Ask Clinical Questions

Physicians often query a patient's electronic health record for information that helps them make treatment decisions, but the cumbersome nature of these records hampers the process. Research has shown that even when a doctor has been trained to use an electronic health record (EHR), finding an answer to just one question can take, on average, more than eight minutes.

The more time physicians must spend navigating an oftentimes clunky EHR interface, the less time they have to interact with patients and provide treatment.

Researchers have begun developing machine-learning models that can streamline the process by automatically finding information physicians need in an EHR. However, training effective models requires huge datasets of relevant medical questions, which are often hard to come by due to privacy restrictions. Existing models struggle to generate authentic questions - those that would be asked by a human doctor - and are often unable to successfully find correct answers.

To overcome this data shortage, researchers at MIT partnered with medical experts to study the questions physicians ask when reviewing EHRs. Then, they built a publicly available dataset of more than 2,000 clinically relevant questions written by these medical experts.

When they used their dataset to train a machine-learning model to generate clinical questions, they found that the model asked high-quality and authentic questions, as compared to real questions from medical experts, more than 60 percent of the time.

With this dataset, they plan to generate vast numbers of authentic medical questions and then use those questions to train a machine-learning model which would help doctors find sought-after information in a patient's record more efficiently.

"Two thousand questions may sound like a lot, but when you look at machine-learning models being trained nowadays, they have so much data, maybe billions of data points. When you train machine-learning models to work in health care settings, you have to be really creative because there is such a lack of data," says lead author Eric Lehman, a graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

The senior author is Peter Szolovits, a professor in the Department of Electrical Engineering and Computer Science (EECS) who heads the Clinical Decision-Making Group in CSAIL and is also a member of the MIT-IBM Watson AI Lab. The research paper, a collaboration between co-authors at MIT, the MIT-IBM Watson AI Lab, IBM Research, and the doctors and medical experts who helped create questions and participated in the study, will be presented at the annual conference of the North American Chapter of the Association for Computational Linguistics.

"Realistic data is critical for training models that are relevant to the task yet difficult to find or create," Szolovits says. "The value of this work is in carefully collecting questions asked by clinicians about patient cases, from which we are able to develop methods that use these data and general language models to ask further plausible questions."

Data deficiency

The few large datasets of clinical questions the researchers were able to find had a host of issues, Lehman explains. Some were composed of medical questions asked by patients on web forums, which are a far cry from physician questions. Other datasets contained questions produced from templates, so they are mostly identical in structure, making many questions unrealistic.

"Collecting high-quality data is really important for doing machine-learning tasks, especially in a health care context, and we’ve shown that it can be done," Lehman says.

To build their dataset, the MIT researchers worked with practicing physicians and medical students in their last year of training. They gave these medical experts more than 100 EHR discharge summaries and told them to read through a summary and ask any questions they might have. The researchers didn't put any restrictions on question types or structures in an effort to gather natural questions. They also asked the medical experts to identify the “trigger text” in the EHR that led them to ask each question.

For instance, a medical expert might read a note in the EHR that says a patient's past medical history is significant for prostate cancer and hypothyroidism. The trigger text "prostate cancer" could lead the expert to ask questions like "date of diagnosis?" or "any interventions done?"

They found that most questions focused on symptoms, treatments, or the patient's test results. While these findings weren't unexpected, quantifying the number of questions about each broad topic will help them build an effective dataset for use in a real, clinical setting, says Lehman.

Once they had compiled their dataset of questions and accompanying trigger text, they used it to train machine-learning models to ask new questions based on the trigger text.

Then the medical experts determined whether those questions were "good" using four metrics: understandability (Does the question make sense to a human physician?), triviality (Is the question too easily answerable from the trigger text?), medical relevance (Does it makes sense to ask this question based on the context?), and relevancy to the trigger (Is the trigger related to the question?).

Cause for concern

The researchers found that when a model was given trigger text, it was able to generate a good question 63 percent of the time, whereas a human physician would ask a good question 80 percent of the time.

They also trained models to recover answers to clinical questions using the publicly available datasets they had found at the outset of this project. Then they tested these trained models to see if they could find answers to "good" questions asked by human medical experts.

The models were only able to recover about 25 percent of answers to physician-generated questions.

"That result is really concerning. What people thought were good-performing models were, in practice, just awful because the evaluation questions they were testing on were not good to begin with," Lehman says.

The team is now applying this work toward their initial goal: building a model that can automatically answer physicians' questions in an EHR. For the next step, they will use their dataset to train a machine-learning model that can automatically generate thousands or millions of good clinical questions, which can then be used to train a new model for automatic question answering.

While there is still much work to do before that model could be a reality, Lehman is encouraged by the strong initial results the team demonstrated with this dataset.

Lehman E, Lialin V, Legaspi KY, Sy AJ, Pile PT, Alberto NR, Ragasa RR, Puyat CV, Alberto IR, Alfonso PG, Taliño M.
Learning to Ask Like a Physician.
arXiv preprint arXiv:2206.02696. 2022. doi: 10.48550/arXiv.2206.02696

Most Popular Now

Clanwilliam Brings Epic Care to the UK

Care homes looking to digitise their administration and care procedures have a new option with the launch of Epic Care in the UK. Epic Care is a modular, scalable system developed...

AI Language Models Write Good Doctor…

Generative AI should be able to write usable doctor's letters and thus potentially speed up medical documentation, according to a study by the University Medical Center Freiburg. Around 93% of...

West Yorkshire and Harrogate Hospitals S…

Clinicians working at five of the six trusts in the West Yorkshire Association of Acute Trusts (WYAAT) can access test results from across their pathology network, following a summer roll-out...

ChatGPT Shows Human-Level Assessment of …

As artificial intelligence advances, its uses and capabilities in real-world applications continue to reach new heights that may even surpass human expertise. In the field of radiology, where a correct...

When Detecting Depression, the Eyes have…

It has been estimated that nearly 300 million people, or about 4% of the global population, are afflicted by some form of depression. But detecting it can be difficult, particularly...

When it comes to Emergency Care, ChatGPT…

If ChatGPT were cut loose in the Emergency Department, it might suggest unneeded x-rays and antibiotics for some patients and admit others who didn't require hospital treatment, a new study...

HWL 2024 Brings Together a Record Number…

1 - 2 October 2024, Luxembourg. The second edition of Healthcare Week Luxembourg on 1 and 2 October 2024, organised by the Federation of Luxembourg Hospitals (FHL), in partnership with the...

AI Drives Development of Cancer Fighting…

University of Houston researchers and their students are developing a new software technology, based on artificial intelligence, for advancing cell-based immunotherapy to treat cancer and other diseases. CellChorus...

MEDICA 2024 + COMPAMED 2024: Adapted Hal…

11 - 14 November 2024, Düsseldorf, Germany. The final preparations for MEDICA 2024 and COMPAMED 2024 in Düsseldorf have begun. A total of more than 5,500 exhibitors from approximately 70 countries...

Revolutionizing Cardiovascular Risk Asse…

A recent position paper in the Asia-Pacific Journal of Ophthalmology explores the transformative potential of artificial intelligence (AI) in ophthalmology. Led by Lama Al-Aswad, Professor of Ophthalmology and Irene Heinz...

AI does Not Necessarily Lead to more Eff…

The use of artificial intelligence (AI) in hospitals and patient care is steadily increasing. Especially in specialist areas with a high proportion of imaging, such as radiology, AI has long...

Why the NHS Needs a Transparency Revolut…

Opinion Article by Dr Mark Ratnarajah, NHS paediatrician and UK Managing Director for C2-Ai. Wes Streeting wanted 'no stone left unturned' when he asked Lord Darzi to examine the current state...