To carry out this analysis, the multidisciplinary team developed Serratus, a cloud computing (Amazon Web Services, AWS) infrastructure that, using a cluster of 22,500 computer processors (CPUs), enabled massive searches for viral sequences in the millions of Gigabytes (Petabytes) of sequencing data available in public databases.
Detailed analysis of certain viral families led to the discovery of more than 30 new coronavirus species, including interesting examples in aquatic vertebrates such as fish and amphibians whose coronaviruses had a genome segmented into two fragments, a feature that has been described in other virus families but had not previously been detected in any coronavirus.
At the Institute for Plant Molecular and Cellular Biology, located in the Polytechnic City of Innovation, UPV scientists used Serratus to analyse the virus that causes human hepatitis D, a viral agent called Delta, of minimal genomic size and unknown origin. This allowed the CSIC researcher at the IBMCP Marcos de la Peña Rivero to detect similar viruses in a multitude of other animals, including not only mammals and other vertebrates but also invertebrates. "Surprisingly, these viruses were also found in environmental samples collected from lakes and soils all over the world, and their hosts are unknown for the time being," reveals De la Peña.
Evolutionary connection between human and plant viruses in the environment
Moreover, environmental samples with hepatitis D-like viruses revealed the presence of novel viral forms with ultra-compact genomes of minute size (only 300 bases, the chemical units that make up the genetic material). "This discovery allows us to advance a close evolutionary connection between viruses as distant as human hepatitis D and plant subviral agents called viroids," says the CSIC researcher.Both the database of all the viruses obtained in the course of this study and the set of tools developed are freely and openly available (http://www.serratus.io). These tools can be of great use in characterising the diversity of all viruses existing in our planet and in preparing the world for possible new pandemics, the devastating consequences of which we are now suffering with emerging viral diseases such as COVID-19, caused by the SARS-CoV-2 coronavirus.
The IBMCP is the only Spanish scientific institution participating in this research, in which the Heidelberg Institute for Theoretical Studies and the Max Planck Institute for Biology (Germany), the Pasteur Institute (France), the University of St. Petersburg (Russia), the University of California, Berkeley (USA) and the University of British Columbia (Canada), among others, also take part.
Edgar RC, Taylor J, Lin V, Altman T, Barbera P, Meleshko D, Lohr D, Novakovsky G, Buchfink B, Al-Shayeb B, Banfield JF, de la Peña M, Korobeynikov A, Chikhi R, Babaian A.
Petabase-scale sequence alignment catalyses viral discovery.
Nature. 2022 Jan 26. doi: 10.1038/s41586-021-04332-2.