Current genomic technologies make it very easy to know the amino acid sequence of a protein but knowing its 3D shape demands for expensive and time-consuming experimental procedures, not always successful. For decades, researchers have tried to understand what makes a protein fold in a particular shape, to predict it from its amino acid sequence.
Alpha Fold 2 is a neural network developed by Deep Mind, a Google-owned Artificial Intelligence company, specifically trained to solve the 3D structure of proteins precisely from its amino acid sequence. Its accuracy impressed the scientific community a few years ago after its victories at the annual international contest on protein structure modelling CASP, when its team presented the full proteome for 11 different species, including humans.
To put all the data released by Alpha Fold 2 into context (over 300k models and growing), a community of independent researchers including Dr. Eduard Porta, head of the Cancer Immunogenetics group at the Josep Carreras Leukaemia Research Institute, compared the new structures made available to the currently available and concluded that Alpha Fold 2 contributed with an extra 25% of high-quality protein structures to any given species. Their analysis has been recently published at the prestigious journal Nature Structural Biology.
The key role that many proteins play in disease, such as cancer, is already known, but the lack of a deep knowledge of their functioning at the molecular level prevents the development of specific strategies against them. The structural information of these proteins will help scientists to understand those proteins much better, to know what other molecules they may interact with inside the cell and to design new drugs, capable of interfering with their function when they are altered.
There are limitations, of course, to the capabilities of Alpha Fold 2. The community team found the algorithm has problems when trying to recreate protein complexes. Most proteins work together with other proteins to get a biological function done, so predicting how different proteins could stick together would be highly desirable. Another limitation identified is its inability to show the structure of mutated proteins, with altered amino acids on its sequence. Mutations often result in abnormal protein function and are the cause of many diseases like cancer.
Despite its limitations, however, the team recognizes the outstanding contribution of Alpha Fold 2 to the community, that will impact basic and biomedical research greatly in the coming years. Not only thanks to its direct contribution (thousands of new reliable 3D protein models), but by starting a new era of computational tools based on artificial intelligence able to yield results that no one can anticipate.
As a matter of fact, this era has already started and, recently, a team at Meta (formerly Facebook) has used a modified version of its natural language predictor to "autocomplete" proteins. This AI tool, called ESMFold, seems to be less accurate compared to its Google's counterpart, but is 60 times faster and can overcome some of the identified Alpha Fold 2 limitations like handling mutated sequences.
All in all, as the authors of the publication admit, "the application of AlphaFold2 [and the coming tools] will have a transformative impact in life sciences."
Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G, Shenoy A, Zhu W, Kundrotas P, Serra VR, Rodrigues CHM, Dunham AS, Burke D, Borkakoti N, Velankar S, Frost A, Basquin J, Lindorff-Larsen K, Bateman A, Kajava AV, Valencia A, Ovchinnikov S, Durairaj J, Ascher DB, Thornton JM, Davey NE, Stein A, Elofsson A, Croll TI, Beltrao P.
A structural biology community assessment of AlphaFold2 applications.
Nat Struct Mol Biol. 2022 Nov 7. doi: 10.1038/s41594-022-00849-w