Welcome Evo, Generative AI for the Genome

Brian Hie runs the Laboratory of Evolutionary Design at Stanford, where he works at the crossroads of artificial intelligence and biology. Not long ago, Hie pondered a provocative question: If a tool like ChatGPT can write original sentences based on patterns found in massive collections of previously written words, what happens if we replace written words with genetic code?

The answer to that seemingly simple question has become Evo, a generative AI model that writes genetic code. Hie and his colleagues at the Arc Institute and the University of California, Berkeley, introduced Evo in a paper in the journal Science. Hie says that researchers might use Evo to understand how microbial and viral genomes work, to fashion new proteins (i.e., drugs) that never existed before, and to reprogram microbes to accomplish remarkable tasks, from improving photosynthesis for carbon sequestration and higher crop yields to gobbling up microplastics from the oceans.

"Instead of having to use brute force testing or mining promising sequences from nature, all of which are quite unpredictable, we now have an AI model for generating systems of interest, allowing researchers to focus only on the most promising possibilities," said Hie, assistant professor of chemical engineering. "Evo puts the genomes of whole lifeforms within reach and accelerates the bioengineering design process."

Evo could even lead to deeper understanding of evolution itself, new understandings of genetic diseases, and new treatments – all achieved on a computer rather than in a lab.

Natural insight

The inspiration comes from nature itself. The instructions of all life are encoded in DNA. Better understanding of the complex interplay of DNA, RNA, and bioproteins - and how they have evolved over time - will lead to deeper knowledge and the ability to reprogram the microbes into useful technologies.

But all is not so easy as it seems. Even simple microbes have complex genomes with millions of base pairs. Two of Evo’s key advances compared to similar existing tools are expanding the length of sequences models can process at once from roughly 8,000 base pairs to more than 131,000 base pairs - known as the "context window" - and improving the resolution to the scale of individual nucleotides, the building blocks of DNA.

Evo was trained on the genomes of 80,000 microbes and 2.7 million prokaryotic and phage genomes, covering 300 billion nucleotides, as well as on smaller DNA loops known as plasmids. To preempt the use of Evo for the development of bioweapons, however, the team had to exclude the genomes of viruses known to infect humans and certain other organisms.

Evo is able to learn how small changes in nucleotide sequences affect the evolutionary fitness of whole organisms and generate DNA sequences of more than 1 million base pairs - more than seven times the context window of 131,000 base pairs, Hie added. By comparison, the smallest “minimal” bacterial genomes are about 580,000 base pairs in length, the researchers note.

Proof of concept

As a proof of concept of Evo's design capabilities, Hie and colleagues prompted Evo to generate novel synthetic CRISPR-Cas molecular complexes and systems. CRISPR-Cas systems are like tiny molecular machines that use proteins and RNA in tandem to edit DNA. In response to that prompt, Evo created a fully functional, previously unknown CRISPR system that was validated after testing 11 possible designs. Evo's CRISPR exploration is the first example of simultaneous protein-RNA codesign using a language model, Hie noted.

Next up, Hie is already working on expanding Evo's ability to process larger genomic sequences as well as to achieve greater control over its outputs, as well as to broaden his research beyond the microbial world to human and other genomes.

"Evo opens up a lot of very interesting research at the intersection of machine learning and biology," Hie said. "It creates opportunities for discoveries that were previously unimaginable and accelerates our ability to engineer life itself."

Evo is open source and publicly available for interested researchers to download.

The research was supported by the Fannie and John Hertz Foundation; National Science Foundation Graduate Fellowship Program; National Center for Advancing Translational Sciences of the National Institutes of Health; National Institutes of Health; National Science Foundation grants; US DEVCOM Army Research Laboratory grants; Office of Naval Research; Stanford HAI; NXP, Xilinx, LETI-CEA, Intel, IBM, Microsoft, NEC, Toshiba, TSMC, ARM, Hitachi, BASF, Accenture, Ericsson, Qualcomm, Analog Devices, Google Cloud, Salesforce, Total, the HAI-GCP Cloud Credits for Research program, the Stanford Data Science Initiative, and members of the Stanford DAWN project: Meta, Google, and VMWare; the Arc Institute; the Rainwater Foundation; the Curci Foundation; Rose Hill Investigators Program; V. and N. Khosla; S. Altman; anonymous gifts to the Hsu laboratory; V. Gupta; and R. Tonsing.

Nguyen E, Poli M, Durrant MG, Kang B, Katrekar D, Li DB, Bartie LJ, Thomas AW, King SH, Brixi G, Sullivan J, Ng MY, Lewis A, Lou A, Ermon S, Baccus SA, Hernandez-Boussard T, Ré C, Hsu PD, Hie BL.
Sequence modeling and design from molecular to genome scale with Evo.
Science. 2024 Nov 15;386(6723):eado9336. doi: 10.1126/science.ado9336

Most Popular Now

500 Patient Images per Second Shared thr…

The image exchange portal, widely known in the NHS as the IEP, is now being used to share as many as 500 images each second - including x-rays, CT, MRI...

Jane Stephenson Joins SPARK TSL as Chief…

Jane Stephenson has joined SPARK TSL as chief executive as the company looks to establish the benefits of SPARK Fusion with trusts looking for deployable solutions to improve productivity. Stephenson joins...

Heart Attacks could be Ruled Out Early w…

As many as 60% of people presenting to emergency departments around the world with heart attack symptoms could be safely sent home, many at earlier stages, with the support of...

NIH-Developed AI Algorithm Successfully …

Researchers from the National Institutes of Health (NIH) have developed an artificial intelligence (AI) algorithm to help speed up the process of matching potential volunteers to relevant clinical research trials...

Northern Ireland's Laboratory Servi…

The transformation of pathology services across Northern Ireland has achieved another milestone, with the completion of phase three of the CoreLIMS programme to deploy Clinisys WinPath to all five health...

Is Your Marketing Effective for an NHS C…

How can you make sure you get the right message across to an NHS chief information officer, or chief nursing information officer? Replay this webinar with Professor Natasha Phillips, former...

We could Soon Use AI to Detect Brain Tum…

A new paper in Biology Methods and Protocols, published by Oxford University Press, shows that scientists can train artificial intelligence (AI) models to distinguish brain tumors from healthy tissue. AI...

Welcome Evo, Generative AI for the Genom…

Brian Hie runs the Laboratory of Evolutionary Design at Stanford, where he works at the crossroads of artificial intelligence and biology. Not long ago, Hie pondered a provocative question: If...

Telehealth Significantly Boosts Treatmen…

New research reveals a dramatic improvement in diagnosing and curing people living with hepatitis C in rural communities using both telemedicine and support from peers with lived experience in drug...

AI can Predict Study Results Better than…

Large language models, a type of AI that analyses text, can predict the results of proposed neuroscience studies more accurately than human experts, finds a new study led by UCL...

Using AI to Treat Infections more Accura…

New research from the Centres for Antimicrobial Optimisation Network (CAMO-Net) at the University of Liverpool has shown that using artificial intelligence (AI) can improve how we treat urinary tract infections...

Research Study Shows the Cost-Effectiven…

Earlier research showed that primary care clinicians using AI-ECG tools identified more unknown cases of a weak heart pump, also called low ejection fraction, than without AI. New study findings...