The dataset, encompassing publications, open research datasets, patents, grants, and clinical trials from 2009 to 2021, was meticulously curated using data from Medline and Dimensions databases. The primary objective of this study was to address the challenge of navigating the vast and rapidly evolving field of Health AI by creating a comprehensive, accessible bibliographic resource.
"Our goal was to provide a dataset that empowers the Health AI community to harness the full potential of AI technologies in improving healthcare outcomes," said Xuanyu Shi, a PhD candidate at Peking University. "By integrating diverse sources of information, we have created a resource that can drive further innovation and facilitate a more coherent research ecosystem."
The study's methodology involved identifying relevant documents using Medical Subject Headings (MeSH) and Field of Research (FoR) terms, followed by mapping these documents to various health problems and AI technologies. The result is a dataset that adheres to the FAIR (Findable, Accessible, Interoperable, Reusable) principles, ensuring its utility for a wide range of applications in Health AI research.
The dataset includes 96,332 Health AI documents, with 75,820 publications, 638 open research datasets, 11,226 patents, 6,113 grants, and 2,535 clinical trials. This extensive collection is designed to facilitate horizontal scanning of funding, research, clinical assessments, and innovations within the Health AI field.
"This dataset represents a significant step forward in Health AI research," said Jian Du, Assistant Professor at Peking University. "By providing a structured and comprehensive resource, we hope to support the Health AI community in developing evidence-based policies, fostering cross-disciplinary collaboration, and ultimately improving healthcare outcomes."
Shi X, Yin D, Bai Y, Zhao W, Guo X, Sun H, Cui D, Du J.
A Bibliographic Dataset of Health Artificial Intelligence Research.
Health Data Sci. 2024 Apr 5;4:0125. doi: 10.34133/hds.0125