The work, developed by a team led by Penn State College of Medicine researchers, outperforms existing methodologies and identified 26% more novel gene and trait associations, the researchers said. They published their work in Nature Communications.
"We all carry some DNA mutations, and we need to figure out how any one of these mutations may influence gene expression linked to disease so we can predict disease risk early. This is especially important for autoimmune disease," said Dajiang Liu, distinguished professor, vice chair for research, and director of artificial intelligence and biomedical informatics at the Penn State College of Medicine and co-senior author of the study. "If an AI algorithm can more accurately predict disease risk, it means we can carry out interventions earlier."
Genetics often underpin disease development. Variations in DNA can influence gene expression, or the process by which the information in DNA is converted into functional products like a protein. How much or how little a gene is expressed can influence disease risk.
Genome-wide association studies (GWAS), a popular approach in human genetics research, can home in on regions of the genome associated with a particular disease or trait but can't pinpoint the specific genes that affect disease risks. It’s like sharing your location with a friend with the precise location setting turned off on your smartphone - the city might be obvious, but the address is obscured. Existing methods are also limited in the granularity of its analysis. Gene expression can be specific to certain types of cells. If the analysis doesn’t distinguish between distinct cell types, the results may overlook real causal relationships between genetic variants and gene expression.
The research team's method, dubbed EXPRESSO for EXpression PREdiction with Summary Statistics Only, applies a more advanced artificial intelligence algorithm and analyzes data from single-cell expression quantitative trait loci, a type of data that links genetic variants to the genes they regulate. It also integrates 3D genomic data and epigenetics - which measures how genes may be modified by environment to influence disease - into its modeling. The team applied EXPRESSO to GWAS datasets for 14 autoimmune diseases, including lupus, Crohn’s disease, ulcerative colitis and rheumatoid arthritis.
"With this new method, we were able to identify many more risk genes for autoimmune disease that actually have cell-type specific effects, meaning that they only have effects in a particular cell type and not others," said Bibo Jiang, assistant professor at the Penn State College of Medicine and senior author of the study.
The team then used this information to identify potential therapeutics for autoimmune disease. Currently, there aren't good long-term treatment options, they said.
"Most treatments are designed to mitigate symptoms, not cure the disease. It’s a dilemma knowing that autoimmune disease needs long-term treatment, but the existing treatments often have such bad side effects that they can’t be used for long. Yet, genomics and AI offer a promising route to develop novel therapeutics," said Laura Carrel, professor of biochemistry and molecular biology at the Penn State College of Medicine and co-senior author of the study.
The team's work pointed to drug compounds that could reverse gene expression in cell types associated with an autoimmune disease, such as vitamin K for ulcerative colitis and metformin, which is typically prescribed for type 2 diabetes, for type 1 diabetes. These drugs, already approved by the Food and Drug Administration as safe and effective for treating other diseases, could potentially be repurposed.
The research team is working with collaborators to validate their findings in a laboratory setting and, ultimately, in clinical trials.
Lida Wang, a doctoral student in the biostatistics program, and Chachrit Khunsriraksakul, who earned a doctorate in bioinformatics and geonomics in 2022 and his medical degree in May from Penn State, co-led the study. Other Penn State College of Medicine authors on the paper include: Havell Markus, who is pursuing a doctorate and a medical degree; Dieyi Chen, doctoral candidate; Fan Zhang, graduate student; and Fang Chen, postdoctoral scholar. Xiaowei Zhan, associate professor at UT Southwestern Medical Center, also contributed to the paper.
Funding from the National Institutes of Health (grant numbers R01HG011035, R01AI174108 and R01ES036042) and the Artificial Intelligence and Biomedical Informatics pilot grant from the Penn State College of Medicine supported this work.
Wang L, Khunsriraksakul C, Markus H, Chen D, Zhang F, Chen F, Zhan X, Carrel L, Liu DJ, Jiang B.
Integrating single cell expression quantitative trait loci summary statistics to understand complex trait risk genes.
Nat Commun. 2024 May 20;15(1):4260. doi: 10.1038/s41467-024-48143-1