Deep Machine Learning Completes Information about the Bioactivity of One Million Molecules

The Structural Bioinformatics and Network Biology laboratory, led by ICREA Researcher Dr. Patrick Aloy, has completed the bioactivity information for a million molecules using deep machine-learning computational models. It has also disclosed a tool to predict the biological activity of any molecule, even when no experimental data are available.

This new methodology is based on the Chemical Checker, the largest database of bioactivity profiles for pseudo pharmaceuticals to date, developed by the same laboratory and published in 2020. The Chemical Checker collects information from 25 spaces of bioactivity for each molecule. These spaces are linked to the chemical structure of the molecule, the targets with which it interacts or the changes it induces at the clinical or cellular level. However, this highly detailed information about the mechanism of action is incomplete for most molecules, implying that for a particular one there may be information for one or two spaces of bioactivity but not for all 25.

With this new development, researchers integrate all the experimental information available with deep machine learning methods, so that all the activity profiles, from chemistry to clinical level, for all molecules can be completed.

"The new tool also allows us to forecast the bioactivity spaces of new molecules, and this is crucial in the drug discovery process as we can select the most suitable candidates and discard those that, for one reason or another, would not work," explains Dr. Aloy.

The software library is freely accessible to the scientific community at bioactivitysignatures.org and it will be regularly updated by the researchers as more biological activity data become available. With each update of experimental data in the Chemical Checker, artificial neural networks will also be revised to refine the estimates.

Predictions and reliability

The bioactivity data predicted by the model have a greater or lesser degree of reliability depending on various factors, including the volume of experimental data available and the characteristics of the molecule.

In addition to predicting aspects of activity at the biological level, the system developed by Dr. Aloy's team provides a measure of the degree of reliability of the prediction for each molecule. "All models are wrong, but some are useful! A measure of confidence allows us to better interpret the results and highlight which spaces of bioactivity of a molecule are accurate and in which ones an error rate can be contemplated," explains Dr. Martino Bertoni, first author of the work.

Testing the system with the IRB Barcelona compound library

To validate the tool, the researchers have searched the library of compounds at IRB Barcelona for those that could be good drug candidates to modulate the activity of a cancer-related transcription factor (SNAIL1), whose activity is almost impossible to modulate due to the direct binding of drugs (it is considered an 'undruggable' target). Of a first set of 17,000 compounds, deep machine learning models predicted characteristics (in their dynamics, interaction with target cells and proteins, etc.) for 131 that fit the target.

The ability of these compounds to degrade SNAIL1 has been confirmed experimentally and it has been observed that, for a high percentage, this degradation capacity is consistent with what the models had predicted, thus validating the system.

This work has been possible thanks to the funding from the Government of Catalonia, the Spanish Ministry of Science and Innovation, the European Research Council, the European Commission, the State Research Agency and the ERDF.

Bertoni M, Duran-Frigola M, Badia-I-Mompel P, Pauls E, Orozco-Ruiz M, Guitart-Pla O, Alcalde V, Diaz VM, Berenguer-Llergo A, Brun-Heath I, Villegas N, de Herreros AG, Aloy P.
Bioactivity descriptors for uncharacterized chemical compounds.
Nat Commun. 2021 Jun 24;12(1):3932. doi: 10.1038/s41467-021-24150-4

Most Popular Now

Research Shows AI Technology Improves Pa…

Existing research indicates that the accuracy of a Parkinson's disease diagnosis hovers between 55% and 78% in the first five years of assessment. That's partly because Parkinson's sibling movement disorders...

Who's to Blame When AI Makes a Medi…

Assistive artificial intelligence technologies hold significant promise for transforming health care by aiding physicians in diagnosing, managing, and treating patients. However, the current trend of assistive AI implementation could actually...

First Therapy Chatbot Trial Shows AI can…

Dartmouth researchers conducted the first clinical trial of a therapy chatbot powered by generative AI and found that the software resulted in significant improvements in participants' symptoms, according to results...

DMEA sparks: The Future of Digital Healt…

8 - 10 April 2025, Berlin, Germany. Digitalization is considered one of the key strategies for addressing the shortage of skilled workers - but the digital health sector also needs qualified...

DeepSeek: The "Watson" to Doct…

DeepSeek is an artificial intelligence (AI) platform built on deep learning and natural language processing (NLP) technologies. Its core products include the DeepSeek-R1 and DeepSeek-V3 models. Leveraging an efficient Mixture...

Stepping Hill Hospital Announced as SPAR…

Stepping Hill Hospital, part of Stockport NHS Foundation Trust, has replaced its bedside units with state-of-the art devices running a full range of information, engagement, communications and productivity apps, to...

DMEA 2025: Digital Health Worldwide in B…

8 - 10 April 2025, Berlin, Germany. From the AI Act, to the potential of the European Health Data Space, to the power of patient data in Scandinavia - DMEA 2025...