Benchmarking compound activity prediction for real-world drug discovery applications.
Commun Chem
; 7(1): 127, 2024 Jun 04.
Article
in En
| MEDLINE
| ID: mdl-38834746
ABSTRACT
Identifying active compounds for target proteins is fundamental in early drug discovery. Recently, data-driven computational methods have demonstrated promising potential in predicting compound activities. However, there lacks a well-designed benchmark to comprehensively evaluate these methods from a practical perspective. To fill this gap, we propose a Compound Activity benchmark for Real-world Applications (CARA). Through carefully distinguishing assay types, designing train-test splitting schemes and selecting evaluation metrics, CARA can consider the biased distribution of current real-world compound activity data and avoid overestimation of model performances. We observed that although current models can make successful predictions for certain proportions of assays, their performances varied across different assays. In addition, evaluation of several few-shot training strategies demonstrated different performances related to task types. Overall, we provide a high-quality dataset for developing and evaluating compound activity prediction models, and the analyses in this work may inspire better applications of data-driven models in drug discovery.
Full text:
1
Collection:
01-internacional
Database:
MEDLINE
Language:
En
Journal:
Commun Chem
Year:
2024
Document type:
Article
Affiliation country:
China
Country of publication:
United kingdom