Search | VHL Regional Portal

1.

Reinvent 4: Modern AI-driven generative molecule design.

Loeffler, Hannes H; He, Jiazhen; Tibo, Alessandro; Janet, Jon Paul; Voronov, Alexey; Mervin, Lewis H; Engkvist, Ola.

J Cheminform ; 16(1): 20, 2024 Feb 21.

Article in English | MEDLINE | ID: mdl-38383444

ABSTRACT

REINVENT 4 is a modern open-source generative AI framework for the design of small molecules. The software utilizes recurrent neural networks and transformer architectures to drive molecule generation. These generators are seamlessly embedded within the general machine learning optimization algorithms, transfer learning, reinforcement learning and curriculum learning. REINVENT 4 enables and facilitates de novo design, R-group replacement, library design, linker design, scaffold hopping and molecule optimization. This contribution gives an overview of the software and describes its design. Algorithms and their applications are discussed in detail. REINVENT 4 is a command line tool which reads a user configuration in either TOML or JSON format. The aim of this release is to provide reference implementations for some of the most common algorithms in AI based molecule generation. An additional goal with the release is to create a framework for education and future innovation in AI based molecular design. The software is available from https://github.com/MolecularAI/REINVENT4 and released under the permissive Apache 2.0 license. Scientific contribution. The software provides an open-source reference implementation for generative molecular design where the software is also being used in production to support in-house drug discovery projects. The publication of the most common machine learning algorithms in one code and full documentation thereof will increase transparency of AI and foster innovation, collaboration and education.

2.

Human-in-the-loop assisted de novo molecular design.

Sundin, Iiris; Voronov, Alexey; Xiao, Haoping; Papadopoulos, Kostas; Bjerrum, Esben Jannik; Heinonen, Markus; Patronov, Atanas; Kaski, Samuel; Engkvist, Ola.

J Cheminform ; 14(1): 86, 2022 Dec 28.

Article in English | MEDLINE | ID: mdl-36578043

ABSTRACT

A de novo molecular design workflow can be used together with technologies such as reinforcement learning to navigate the chemical space. A bottleneck in the workflow that remains to be solved is how to integrate human feedback in the exploration of the chemical space to optimize molecules. A human drug designer still needs to design the goal, expressed as a scoring function for the molecules that captures the designer's implicit knowledge about the optimization task. Little support for this task exists and, consequently, a chemist usually resorts to iteratively building the objective function of multi-parameter optimization (MPO) in de novo design. We propose a principled approach to use human-in-the-loop machine learning to help the chemist to adapt the MPO scoring function to better match their goal. An advantage is that the method can learn the scoring function directly from the user's feedback while they browse the output of the molecule generator, instead of the current manual tuning of the scoring function with trial and error. The proposed method uses a probabilistic model that captures the user's idea and uncertainty about the scoring function, and it uses active learning to interact with the user. We present two case studies for this: In the first use-case, the parameters of an MPO are learned, and in the second use-case a non-parametric component of the scoring function to capture human domain knowledge is developed. The results show the effectiveness of the methods in two simulated example cases with an oracle, achieving significant improvement in less than 200 feedback queries, for the goals of a high QED score and identifying potent molecules for the DRD2 receptor, respectively. We further demonstrate the performance gains with a medicinal chemist interacting with the system.

3.

Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design.

Kwapien, Karolina; Nittinger, Eva; He, Jiazhen; Margreitter, Christian; Voronov, Alexey; Tyrchan, Christian.

ACS Omega ; 7(30): 26573-26581, 2022 Aug 02.

Article in English | MEDLINE | ID: mdl-35936431

ABSTRACT

Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure-activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.

4.

DockStream: a docking wrapper to enhance de novo molecular design.

Guo, Jeff; Janet, Jon Paul; Bauer, Matthias R; Nittinger, Eva; Giblin, Kathryn A; Papadopoulos, Kostas; Voronov, Alexey; Patronov, Atanas; Engkvist, Ola; Margreitter, Christian.

J Cheminform ; 13(1): 89, 2021 Nov 17.

Article in English | MEDLINE | ID: mdl-34789335

ABSTRACT

Recently, we have released the de novo design platform REINVENT in version 2.0. This improved and extended iteration supports far more features and scoring function components, which allows bespoke and tailor-made protocols to maximize impact in small molecule drug discovery projects. A major obstacle of generative models is producing active compounds, in which predictive (QSAR) models have been applied to enrich target activity. However, QSAR models are inherently limited by their applicability domains. To overcome these limitations, we introduce a structure-based scoring component for REINVENT. DockStream is a flexible, stand-alone molecular docking wrapper that provides access to a collection of ligand embedders and docking backends. Using the benchmarking and analysis workflow provided in DockStream, execution and subsequent analysis of a variety of docking configurations can be automated. Docking algorithms vary greatly in performance depending on the target and the benchmarking and analysis workflow provides a streamlined solution to identifying productive docking configurations. We show that an informative docking configuration can inform the REINVENT agent to optimize towards improving docking scores using public data. With docking activated, REINVENT is able to retain key interactions in the binding site, discard molecules which do not fit the binding cavity, harness unused (sub-)pockets, and improve overall performance in the scaffold-hopping scenario. The code is freely available at https://github.com/MolecularAI/DockStream .

5.

Multivariate curve resolution applied to in situ X-ray absorption spectroscopy data: an efficient tool for data processing and analysis.

Voronov, Alexey; Urakawa, Atsushi; van Beek, Wouter; Tsakoumis, Nikolaos E; Emerich, Hermann; Rønning, Magnus.

Anal Chim Acta ; 840: 20-7, 2014 Aug 20.

Article in English | MEDLINE | ID: mdl-25086889

ABSTRACT

Large datasets containing many spectra commonly associated with in situ or operando experiments call for new data treatment strategies as conventional scan by scan data analysis methods have become a time-consuming bottleneck. Several convenient automated data processing procedures like least square fitting of reference spectra exist but are based on assumptions. Here we present the application of multivariate curve resolution (MCR) as a blind-source separation method to efficiently process a large data set of an in situ X-ray absorption spectroscopy experiment where the sample undergoes a periodic concentration perturbation. MCR was applied to data from a reversible reduction-oxidation reaction of a rhenium promoted cobalt Fischer-Tropsch synthesis catalyst. The MCR algorithm was capable of extracting in a highly automated manner the component spectra with a different kinetic evolution together with their respective concentration profiles without the use of reference spectra. The modulative nature of our experiments allows for averaging of a number of identical periods and hence an increase in the signal to noise ratio (S/N) which is efficiently exploited by MCR. The practical and added value of the approach in extracting information from large and complex datasets, typical for in situ and operando studies, is highlighted.

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL