Search | VHL Regional Portal

Mining for Potent Inhibitors through Artificial Intelligence and Physics: A Unified Methodology for Ligand Based and Structure Based Drug Design.

Li, Jie; Zhang, Oufan; Sun, Kunyang; Wang, Yingze; Guan, Xingyi; Bagni, Dorian; Haghighatlari, Mojtaba; Kearns, Fiona L; Parks, Conor; Amaro, Rommie E; Head-Gordon, Teresa.

J Chem Inf Model ; 2024 Jun 06.

Article in English | MEDLINE | ID: mdl-38843070

ABSTRACT

Determining the viability of a new drug molecule is a time- and resource-intensive task that makes computer-aided assessments a vital approach to rapid drug discovery. Here we develop a machine learning algorithm, iMiner, that generates novel inhibitor molecules for target proteins by combining deep reinforcement learning with real-time 3D molecular docking using AutoDock Vina, thereby simultaneously creating chemical novelty while constraining molecules for shape and molecular compatibility with target active sites. Moreover, through the use of various types of reward functions, we have introduced novelty in generative tasks for new molecules such as chemical similarity to a target ligand, molecules grown from known protein bound fragments, and creation of molecules that enforce interactions with target residues in the protein active site. The iMiner algorithm is embedded in a composite workflow that filters out Pan-assay interference compounds, Lipinski rule violations, uncommon structures in medicinal chemistry, and poor synthetic accessibility with options for cross-validation against other docking scoring functions and automation of a molecular dynamics simulation to measure pose stability. We also allow users to define a set of rules for the structures they would like to exclude during the training process and postfiltering steps. Because our approach relies only on the structure of the target protein, iMiner can be easily adapted for the future development of other inhibitors or small molecule therapeutics of any target protein.

Leak Proof PDBBind: A Reorganized Dataset of Protein-Ligand Complexes for More Generalizable Binding Affinity Prediction.

Li, Jie; Guan, Xingyi; Zhang, Oufan; Sun, Kunyang; Wang, Yingze; Bagni, Dorian; Head-Gordon, Teresa.

ArXiv ; 2023 Aug 18.

Article in English | MEDLINE | ID: mdl-37645037

ABSTRACT

Many physics-based and machine-learned scoring functions (SFs) used to predict protein-ligand binding free energies have been trained on the PDBBind dataset. However, it is controversial as to whether new SFs are actually improving since the general, refined, and core datasets of PDBBind are cross-contaminated with proteins and ligands with high similarity, and hence they may not perform comparably well in binding prediction of new protein-ligand complexes. In this work we have carefully prepared a cleaned PDBBind data set of non-covalent binders that are split into training, validation, and test datasets to control for data leakage. The resulting leak-proof (LP)-PDBBind data is used to retrain four popular SFs: AutoDock vina, Random Forest (RF)-Score, InteractionGraphNet (IGN), and DeepDTA, to better test their capabilities when applied to new protein-ligand complexes. In particular we have formulated a new independent data set, BDB2020+, by matching high quality binding free energies from BindingDB with co-crystalized ligand-protein complexes from the PDB that have been deposited since 2020. Based on all the benchmark results, the retrained models using LP-PDBBind that rely on 3D information perform consistently among the best, with IGN especially being recommended for scoring and ranking applications for new protein-ligand systems.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL