PharmaBench: Enhancing ADMET benchmarks with large language models.
Sci Data
; 11(1): 985, 2024 Sep 10.
Article
en En
| MEDLINE
| ID: mdl-39256394
ABSTRACT
Accurately predicting ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) properties early in drug development is essential for selecting compounds with optimal pharmacokinetics and minimal toxicity. Existing ADMET-related benchmark sets are limited in utility due to their small dataset sizes and the lack of representation of compounds used in drug discovery projects. These shortcomings hinder their application in model building for drug discovery. To address this issue, we propose a multi-agent data mining system based on Large Language Models that effectively identifies experimental conditions within 14,401 bioassays. This approach facilitates merging entries from different sources, culminating in the creation of PharmaBench. Additionally, we have developed a data processing workflow to integrate data from various sources, resulting in 156,618 raw entries. Through this workflow, we constructed PharmaBench, a comprehensive benchmark set for ADMET properties, which comprises eleven ADMET datasets and 52,482 entries. This benchmark set is designed to serve as an open-source dataset for the development of AI models relevant to drug discovery projects.
Texto completo:
1
Colección:
01-internacional
Base de datos:
MEDLINE
Asunto principal:
Benchmarking
/
Descubrimiento de Drogas
Límite:
Humans
Idioma:
En
Revista:
Sci Data
Año:
2024
Tipo del documento:
Article
País de afiliación:
China
Pais de publicación:
Reino Unido