Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
J Phys Chem A ; 126(34): 5837-5852, 2022 Sep 01.
Article in English | MEDLINE | ID: mdl-35984470

ABSTRACT

Organic semiconductors have many desirable properties including improved manufacturing and flexible mechanical properties. Due to the vastness of chemical space, it is essential to efficiently explore chemical space when designing new materials, including through the use of generative techniques. New generative machine learning methods for molecular design continue to be published in the literature at a significant rate but successfully adapting methods to new chemistry and problem domains remains difficult. These challenges necessitate continual method evaluation to probe method viability for use in alternative applications not covered in the original works. In continuation of our previous work, we evaluate four additional machine-learning-based de novo methods for generating molecules with high predicted hole mobility for use in semiconductor applications. The four generative methods evaluated here are (1) Molecule Deep Q-Networks (MolDQN), which utilizes Deep-Q learning to directly optimize molecular structure graphs for desired properties instead of generating SMILES, (2) Graph-based Genetic Algorithm (GraphGA), which uses a genetic algorithm for optimization where crossovers and mutations are defined in terms of RDKit's reaction SMILES, (3) Generative Tensorial Reinforcement Learning (GENTRL), which is a variational autoencoder (VAE) with a learned prior distribution and optimized using reinforcement learning, and (4) Monte Carlo tree search exploration of chemical space in conjunction with a recurrent neural network (RNN) decoder (ChemTS). The generated molecules were evaluated using density functional theory (DFT) and we discovered better performing molecules with the GraphGA method compared to the other approaches.

2.
J Phys Chem A ; 125(33): 7331-7343, 2021 Aug 26.
Article in English | MEDLINE | ID: mdl-34342466

ABSTRACT

Materials exhibiting higher mobilities than conventional organic semiconducting materials such as fullerenes and fused thiophenes are in high demand for applications in printed electronics. To discover new molecules in the heteroacene family that might show improved hole mobility, three de novo design methods were applied. Machine learning (ML) models were generated based on previously calculated hole reorganization energies of a quarter million examples of heteroacenes, where the energies were calculated by applying density functional theory (DFT) and a massive cloud computing environment. The three generative methods applied were (1) the continuous space method, where molecular structures are converted into continuous variables by applying the variational autoencoder/decoder technique; (2) the method based on reinforcement learning of SMILES strings (the REINVENT method); and (3) the junction tree variational autoencoder method that directly generates molecular graphs. Among the three methods, the second and third methods succeeded in obtaining chemical structures whose DFT-calculated hole reorganization energy was lower than the lowest energy in the training dataset. This suggests that an extrapolative materials design protocol can be developed by applying generative modeling to a quantitative structure-property relationship (QSPR) utility function.

3.
J Chem Inf Model ; 60(9): 4311-4325, 2020 09 28.
Article in English | MEDLINE | ID: mdl-32484669

ABSTRACT

The hit identification process usually involves the profiling of millions to more recently billions of compounds either via traditional experimental high-throughput screens (HTS) or computational virtual high-throughput screens (vHTS). We have previously demonstrated that, by coupling reaction-based enumeration, active learning, and free energy calculations, a similarly large-scale exploration of chemical space can be extended to the hit-to-lead process. In this work, we augment that approach by coupling large scale enumeration and cloud-based free energy perturbation (FEP) profiling with goal-directed generative machine learning, which results in a higher enrichment of potent ideas compared to large scale enumeration alone, while simultaneously staying within the bounds of predefined drug-like property space. We can achieve this by building the molecular distribution for generative machine learning from the PathFinder rules-based enumeration and optimizing for a weighted sum QSAR-based multiparameter optimization function. We examine the utility of this combined approach by designing potent inhibitors of cyclin-dependent kinase 2 (CDK2) and demonstrate a coupled workflow that can (1) provide a 6.4-fold enrichment improvement in identifying <10 nM compounds over random selection and a 1.5-fold enrichment in identifying <10 nM compounds over our previous method, (2) rapidly explore relevant chemical space outside the bounds of commercial reagents, (3) use generative ML approaches to "learn" the SAR from large scale in silico enumerations and generate novel idea molecules for a flexible receptor site that are both potent and within relevant physicochemical space, and (4) produce over 3 000 000 idea molecules and run 1935 FEP simulations, identifying 69 ideas with a predicted IC50 < 10 nM and 358 ideas with a predicted IC50 < 100 nM. The reported data suggest combining both reaction-based and generative machine learning for ideation results in a higher enrichment of potent compounds over previously described approaches and has the potential to rapidly accelerate the discovery of novel chemical matter within a predefined potency and property space.


Subject(s)
Drug Discovery , Pharmaceutical Preparations , Computer Simulation , Goals , Machine Learning
4.
J Chem Inf Model ; 59(3): 1017-1029, 2019 03 25.
Article in English | MEDLINE | ID: mdl-30758950

ABSTRACT

Chemical structure extraction from documents remains a hard problem because of both false positive identification of structures during segmentation and errors in the predicted structures. Current approaches rely on handcrafted rules and subroutines that perform reasonably well generally but still routinely encounter situations where recognition rates are not yet satisfactory and systematic improvement is challenging. Complications impacting the performance of current approaches include the diversity in visual styles used by various software to render structures, the frequent use of ad hoc annotations, and other challenges related to image quality, including resolution and noise. We present end-to-end deep learning solutions for both segmenting molecular structures from documents and predicting chemical structures from the segmented images. This deep-learning-based approach does not require any handcrafted features, is learned directly from data, and is robust against variations in image quality and style. Using the deep learning approach described herein, we show that it is possible to perform well on both segmentation and prediction of low-resolution images containing moderately sized molecules found in journal articles and patents.


Subject(s)
Deep Learning , Drug Discovery/methods , Data Mining , Documentation
SELECTION OF CITATIONS
SEARCH DETAIL
...