Search | VHL Regional Portal

Predicting retrosynthetic pathways using transformer-based models and a hyper-graph exploration strategy.

Schwaller, Philippe; Petraglia, Riccardo; Zullo, Valerio; Nair, Vishnu H; Haeuselmann, Rico Andreas; Pisoni, Riccardo; Bekas, Costas; Iuliano, Anna; Laino, Teodoro.

Chem Sci ; 11(12): 3316-3325, 2020 Mar 03.

Article in English | MEDLINE | ID: mdl-34122839

ABSTRACT

We present an extension of our Molecular Transformer model combined with a hyper-graph exploration strategy for automatic retrosynthesis route planning without human intervention. The single-step retrosynthetic model sets a new state of the art for predicting reactants as well as reagents, solvents and catalysts for each retrosynthetic step. We introduce four metrics (coverage, class diversity, round-trip accuracy and Jensen-Shannon divergence) to evaluate the single-step retrosynthetic models, using the forward prediction and a reaction classification model always based on the transformer architecture. The hypergraph is constructed on the fly, and the nodes are filtered and further expanded based on a Bayesian-like probability. We critically assessed the end-to-end framework with several retrosynthesis examples from literature and academic exams. Overall, the frameworks have an excellent performance with few weaknesses related to the training data. The use of the introduced metrics opens up the possibility to optimize entire retrosynthetic frameworks by focusing on the performance of the single-step model only.

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction.

Schwaller, Philippe; Laino, Teodoro; Gaudin, Théophile; Bolgar, Peter; Hunter, Christopher A; Bekas, Costas; Lee, Alpha A.

ACS Cent Sci ; 5(9): 1572-1583, 2019 Sep 25.

Article in English | MEDLINE | ID: mdl-31572784

ABSTRACT

Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: Given reactants and reagents, predict the products. Similar to other work, we treat reaction prediction as a machine translation problem between simplified molecular-input line-entry system (SMILES) strings (a text-based representation) of reactants, reagents, and the products. We show that a multihead attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark data set. Molecular Transformer makes predictions by inferring the correlations between the presence and absence of chemical motifs in the reactant, reagent, and product present in the data set. Our model requires no handcrafted rules and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without a reactant-reagent split and including stereochemistry, which makes our method universally applicable.

"Found in Translation": predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models.

Schwaller, Philippe; Gaudin, Théophile; Lányi, Dávid; Bekas, Costas; Laino, Teodoro.

Chem Sci ; 9(28): 6091-6098, 2018 Jul 28.

Article in English | MEDLINE | ID: mdl-30090297

ABSTRACT

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset.

Changing computing paradigms towards power efficiency.

Klavík, Pavel; Malossi, A Cristiano I; Bekas, Costas; Curioni, Alessandro.

Philos Trans A Math Phys Eng Sci ; 372(2018): 20130278, 2014 Jun 28.

Article in English | MEDLINE | ID: mdl-24842033

ABSTRACT

Power awareness is fast becoming immensely important in computing, ranging from the traditional high-performance computing applications to the new generation of data centric workloads. In this work, we describe our efforts towards a power-efficient computing paradigm that combines low- and high-precision arithmetic. We showcase our ideas for the widely used kernel of solving systems of linear equations that finds numerous applications in scientific and engineering disciplines as well as in large-scale data analytics, statistics and machine learning. Towards this goal, we developed tools for the seamless power profiling of applications at a fine-grain level. In addition, we verify here previous work on post-FLOPS/W metrics and show that these can shed much more light in the power/energy profile of important applications.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL