Search | VHL Regional Portal

1.

Machine learning the ropes: principles, applications and directions in synthetic chemistry.

Strieth-Kalthoff, Felix; Sandfort, Frederik; Segler, Marwin H S; Glorius, Frank.

Chem Soc Rev ; 49(17): 6154-6168, 2020 Sep 01.

Article in English | MEDLINE | ID: mdl-32672294

ABSTRACT

Machine learning (ML) has emerged as a general, problem-solving paradigm with many applications in computer vision, natural language processing, digital safety, or medicine. By recognizing complex patterns in data, ML bears the potential to modernise the way how many chemical challenges are approached. In this review, an introduction to ML is given from the perspective of synthetic chemistry: starting from the fundamentals regarding algorithms and best-practice workflows, the review covers different applications of machine learning in synthesis planning, property prediction, molecular design, and reactivity prediction. In particular, different approaches of representing and utilizing organic molecules will be discussed - providing synthetic chemists both with the understanding and the tools required to apply machine learning in the context of their research, and pointers for further studying.

2.

GuacaMol: Benchmarking Models for de Novo Molecular Design.

Brown, Nathan; Fiscato, Marco; Segler, Marwin H S; Vaucher, Alain C.

J Chem Inf Model ; 59(3): 1096-1108, 2019 03 25.

Article in English | MEDLINE | ID: mdl-30887799

ABSTRACT

De novo design seeks to generate molecules with required property profiles by virtual design-make-test cycles. With the emergence of deep learning and neural generative models in many application areas, models for molecular design based on neural networks appeared recently and show promising results. However, the new models have not been profiled on consistent tasks, and comparative studies to well-established algorithms have only seldom been performed. To standardize the assessment of both classical and neural models for de novo molecular design, we propose an evaluation framework, GuacaMol, based on a suite of standardized benchmarks. The benchmark tasks encompass measuring the fidelity of the models to reproduce the property distribution of the training sets, the ability to generate novel molecules, the exploration and exploitation of chemical space, and a variety of single and multiobjective optimization tasks. The benchmarking open-source Python code and a leaderboard can be found on https://benevolent.ai/guacamol .

Subject(s)

Benchmarking/methods , Deep Learning , Pharmaceutical Preparations/chemistry , Drug Design , Isomerism , Models, Molecular , Molecular Structure , Monte Carlo Method , Quantitative Structure-Activity Relationship

3.

Opportunities and obstacles for deep learning in biology and medicine.

Ching, Travers; Himmelstein, Daniel S; Beaulieu-Jones, Brett K; Kalinin, Alexandr A; Do, Brian T; Way, Gregory P; Ferrero, Enrico; Agapow, Paul-Michael; Zietz, Michael; Hoffman, Michael M; Xie, Wei; Rosen, Gail L; Lengerich, Benjamin J; Israeli, Johnny; Lanchantin, Jack; Woloszynek, Stephen; Carpenter, Anne E; Shrikumar, Avanti; Xu, Jinbo; Cofer, Evan M; Lavender, Christopher A; Turaga, Srinivas C; Alexandari, Amr M; Lu, Zhiyong; Harris, David J; DeCaprio, Dave; Qi, Yanjun; Kundaje, Anshul; Peng, Yifan; Wiley, Laura K; Segler, Marwin H S; Boca, Simina M; Swamidass, S Joshua; Huang, Austin; Gitter, Anthony; Greene, Casey S.

J R Soc Interface ; 15(141)2018 04.

Article in English | MEDLINE | ID: mdl-29618526

ABSTRACT

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

Subject(s)

Biomedical Research/trends , Biomedical Technology/trends , Deep Learning/trends , Algorithms , Biomedical Research/methods , Decision Making , Delivery of Health Care/methods , Delivery of Health Care/trends , Disease/genetics , Drug Design , Electronic Health Records/trends , Humans , Terminology as Topic

4.

Planning chemical syntheses with deep neural networks and symbolic AI.

Segler, Marwin H S; Preuss, Mike; Waller, Mark P.

Nature ; 555(7698): 604-610, 2018 03 28.

Article in English | MEDLINE | ID: mdl-29595767

ABSTRACT

To plan the syntheses of small organic molecules, chemists use retrosynthesis, a problem-solving technique in which target molecules are recursively transformed into increasingly simpler precursors. Computer-aided retrosynthesis would be a valuable tool but at present it is slow and provides results of unsatisfactory quality. Here we use Monte Carlo tree search and symbolic artificial intelligence (AI) to discover retrosynthetic routes. We combined Monte Carlo tree search with an expansion policy network that guides the search, and a filter network to pre-select the most promising retrosynthetic steps. These deep neural networks were trained on essentially all reactions ever published in organic chemistry. Our system solves for almost twice as many molecules, thirty times faster than the traditional computer-aided search method, which is based on extracted rules and hand-designed heuristics. In a double-blind AB test, chemists on average considered our computer-generated routes to be equivalent to reported literature routes.

Subject(s)

Artificial Intelligence , Chemistry Techniques, Synthetic/methods , Neural Networks, Computer , Chemistry, Organic/methods , Monte Carlo Method

5.

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks.

Segler, Marwin H S; Kogej, Thierry; Tyrchan, Christian; Waller, Mark P.

ACS Cent Sci ; 4(1): 120-131, 2018 Jan 24.

Article in English | MEDLINE | ID: mdl-29392184

ABSTRACT

In de novo drug design, computational strategies are used to generate novel molecules with good affinity to the desired biological target. In this work, we show that recurrent neural networks can be trained as generative models for molecular structures, similar to statistical language models in natural language processing. We demonstrate that the properties of the generated molecules correlate very well with the properties of the molecules used to train the model. In order to enrich libraries with molecules active toward a given biological target, we propose to fine-tune the model with small sets of molecules, which are known to be active against that target. Against Staphylococcus aureus, the model reproduced 14% of 6051 hold-out test molecules that medicinal chemists designed, whereas against Plasmodium falciparum (Malaria), it reproduced 28% of 1240 test molecules. When coupled with a scoring function, our model can perform the complete de novo drug design cycle to generate large sets of novel molecules for drug discovery.

6.

Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction.

Segler, Marwin H S; Waller, Mark P.

Chemistry ; 23(25): 5966-5971, 2017 May 02.

Article in English | MEDLINE | ID: mdl-28134452

ABSTRACT

Reaction prediction and retrosynthesis are the cornerstones of organic chemistry. Rule-based expert systems have been the most widespread approach to computationally solve these two related challenges to date. However, reaction rules often fail because they ignore the molecular context, which leads to reactivity conflicts. Herein, we report that deep neural networks can learn to resolve reactivity conflicts and to prioritize the most suitable transformation rules. We show that by training our model on 3.5âmillion reactions taken from the collective published knowledge of the entire discipline of chemistry, our model exhibits a top10-accuracy of 95 % in retrosynthesis and 97 % for reaction prediction on a validation set of almost 1âmillion reactions.

7.

Modelling Chemical Reasoning to Predict and Invent Reactions.

Segler, Marwin H S; Waller, Mark P.

Chemistry ; 23(25): 6118-6128, 2017 May 02.

Article in English | MEDLINE | ID: mdl-27862477

ABSTRACT

The ability to reason beyond established knowledge allows organic chemists to solve synthetic problems and invent novel transformations. Herein, we propose a model that mimics chemical reasoning, and formalises reaction prediction as finding missing links in a knowledge graph. We have constructed a knowledge graph containing 14.4 million molecules and 8.2 million binary reactions, which represents the bulk of all chemical reactions ever published in the scientific literature. Our model outperforms a rule-based expert system in the reaction prediction task for 180 000 randomly selected binary reactions. The data-driven model generalises even beyond known reaction types, and is thus capable of effectively (re-)discovering novel transformations (even including transition metal-catalysed reactions). Our model enables computers to infer hypotheses about reactivity and reactions by only considering the intrinsic local structure of the graph and because each single reaction prediction is typically achieved in a sub-second time frame, the model can be used as a high-throughput generator of reaction hypotheses for reaction discovery.

8.

11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

Fechner, Uli; de Graaf, Chris; Torda, Andrew E; Güssregen, Stefan; Evers, Andreas; Matter, Hans; Hessler, Gerhard; Richmond, Nicola J; Schmidtke, Peter; Segler, Marwin H S; Waller, Mark P; Pleik, Stefanie; Shea, Joan-Emma; Levine, Zachary; Mullen, Ryan; van den Broek, Karina; Epple, Matthias; Kuhn, Hubert; Truszkowski, Andreas; Zielesny, Achim; Fraaije, Johannes Hans; Gracia, Ruben Serral; Kast, Stefan M; Bulusu, Krishna C; Bender, Andreas; Yosipof, Abraham; Nahum, Oren; Senderowitz, Hanoch; Krotzky, Timo; Schulz, Robert; Wolber, Gerhard; Bietz, Stefan; Rarey, Matthias; Zimmermann, Markus O; Lange, Andreas; Ruff, Manuel; Heidrich, Johannes; Onlia, Ionut; Exner, Thomas E; Boeckler, Frank M; Bermudez, Marcel; Firaha, Dzmitry S; Hollóczki, Oldamur; Kirchner, Barbara; Tautermann, Christofer S; Volkamer, Andrea; Eid, Sameh; Turk, Samo; Rippmann, Friedrich; Fulle, Simone.

J Cheminform ; 8(Suppl 1): 18, 2016 Apr 26.

Article in English | MEDLINE | ID: mdl-29270804

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL