RESUMO
Multitask deep neural networks learn to predict ligand-target binding by example, yet public pharmacological data sets are sparse, imbalanced, and approximate. We constructed two hold-out benchmarks to approximate temporal and drug-screening test scenarios, whose characteristics differ from a random split of conventional training data sets. We developed a pharmacological data set augmentation procedure, Stochastic Negative Addition (SNA), which randomly assigns untested molecule-target pairs as transient negative examples during training. Under the SNA procedure, drug-screening benchmark performance increases from R2 = 0.1926 ± 0.0186 to 0.4269 ± 0.0272 (122%). This gain was accompanied by a modest decrease in the temporal benchmark (13%). SNA increases in drug-screening performance were consistent for classification and regression tasks and outperformed y-randomized controls. Our results highlight where data and feature uncertainty may be problematic and how leveraging uncertainty into training improves predictions of drug-target relationships.
Assuntos
Aprendizado de Máquina , Redes Neurais de ComputaçãoRESUMO
Statistical and machine learning approaches predict drug-to-target relationships from 2D small-molecule topology patterns. One might expect 3D information to improve these calculations. Here we apply the logic of the extended connectivity fingerprint (ECFP) to develop a rapid, alignment-invariant 3D representation of molecular conformers, the extended three-dimensional fingerprint (E3FP). By integrating E3FP with the similarity ensemble approach (SEA), we achieve higher precision-recall performance relative to SEA with ECFP on ChEMBL20 and equivalent receiver operating characteristic performance. We identify classes of molecules for which E3FP is a better predictor of similarity in bioactivity than is ECFP. Finally, we report novel drug-to-target binding predictions inaccessible by 2D fingerprints and confirm three of them experimentally with ligand efficiencies from 0.442-0.637 kcal/mol/heavy atom.