Search | VHL Regional Portal

1.

State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis.

Tetko, Igor V; Karpov, Pavel; Van Deursen, Ruud; Godin, Guillaume.

Nat Commun ; 11(1): 5575, 2020 11 04.

Article in English | MEDLINE | ID: mdl-33149154

ABSTRACT

We investigated the effect of different training scenarios on predicting the (retro)synthesis of chemical compounds using text-like representation of chemical reactions (SMILES) and Natural Language Processing (NLP) neural network Transformer architecture. We showed that data augmentation, which is a powerful method used in image processing, eliminated the effect of data memorization by neural networks and improved their performance for prediction of new sequences. This effect was observed when augmentation was used simultaneously for input and the target data simultaneously. The top-5 accuracy was 84.8% for the prediction of the largest fragment (thus identifying principal transformation for classical retro-synthesis) for the USPTO-50k test dataset, and was achieved by a combination of SMILES augmentation and a beam search algorithm. The same approach provided significantly better results for the prediction of direct reactions from the single-step USPTO-MIT test set. Our model achieved 90.6% top-1 and 96.1% top-5 accuracy for its challenging mixed set and 97% top-5 accuracy for the USPTO-MIT separated set. It also significantly improved results for USPTO-full set single-step retrosynthesis for both top-1 and top-10 accuracies. The appearance frequency of the most abundantly generated SMILES was well correlated with the prediction outcome and can be used as a measure of the quality of reaction prediction.

2.

Machine Learning Predicts Degree of Aromaticity from Structural Fingerprints.

Ponting, David J; van Deursen, Ruud; Ott, Martin A.

J Chem Inf Model ; 60(10): 4560-4568, 2020 10 26.

Article in English | MEDLINE | ID: mdl-32966076

ABSTRACT

Prediction of whether a compound is "aromatic" is at first glance a relatively simple task-does it obey Hückel's rule (planar cyclic π-system with 4n + 2 electrons) or not? However, aromaticity is far from a binary property, and there are distinct variations in the chemical and biological behavior of different systems which obey Hückel's rule and are thus classified as aromatic. To that end, the aromaticity of each molecule in a large public dataset was quantified by an extension of the work of Raczynska et al. Building on this data, a method is proposed for machine learning the degree of aromaticity of each aromatic ring in a molecule. Categories are derived from the numeric results, allowing the differentiation of structural patterns between them and thus a better representation of the underlying chemical and biological behavior in expert and (Q)SAR systems.

Subject(s)

Electrons , Machine Learning

3.

GEN: highly efficient SMILES explorer using autodidactic generative examination networks.

van Deursen, Ruud; Ertl, Peter; Tetko, Igor V; Godin, Guillaume.

J Cheminform ; 12(1): 22, 2020 Apr 10.

Article in English | MEDLINE | ID: mdl-33430998

ABSTRACT

Recurrent neural networks have been widely used to generate millions of de novo molecules in defined chemical spaces. Reported deep generative models are exclusively based on LSTM and/or GRU units and frequently trained using canonical SMILES. In this study, we introduce Generative Examination Networks (GEN) as a new approach to train deep generative networks for SMILES generation. In our GENs, we have used an architecture based on multiple concatenated bidirectional RNN units to enhance the validity of generated SMILES. GENs autonomously learn the target space in a few epochs and are stopped early using an independent online examination mechanism, measuring the quality of the generated set. Herein we have used online statistical quality control (SQC) on the percentage of valid molecular SMILES as examination measure to select the earliest available stable model weights. Very high levels of valid SMILES (95-98%) can be generated using multiple parallel encoding layers in combination with SMILES augmentation using unrestricted SMILES randomization. Our trained models combine an excellent novelty rate (85-90%) while generating SMILES with strong conservation of the property space (95-99%). In GENs, both the generative network and the examination mechanism are open to other architectures and quality criteria.

4.

A platform for target prediction of phenotypic screening hit molecules.

Homeyer, Nadine; van Deursen, Ruud; Ochoa-Montaño, Bernardo; Heikamp, Kathrin; Ray, Peter; Zuccotto, Fabio; Blundell, Tom L; Gilbert, Ian H.

J Mol Graph Model ; 95: 107485, 2020 03.

Article in English | MEDLINE | ID: mdl-31836397

ABSTRACT

Many drug discovery programmes, particularly for infectious diseases, are conducted phenotypically. Identifying the targets of phenotypic screening hits experimentally can be complex, time-consuming, and expensive. However, it would be valuable to know what the molecular target(s) is, as knowledge of the binding pose of the hit molecule in the binding site can facilitate the compound optimisation. Furthermore, knowing the target would allow de-prioritisation of less attractive chemical series or molecular targets. To generate target-hypotheses for phenotypic active compounds, an in silico platform was developed that utilises both ligand and protein-structure information to generate a ranked set of predicted molecular targets. As a result of the web-based workflow the user obtains a set of 3D structures of the predicted targets with the active molecule bound. The platform was exemplified using Mycobacterium tuberculosis, the causative organism of tuberculosis. In a test that we performed, the platform was able to predict the targets of 60% of compounds investigated, where there was some similarity to a ligand in the protein database.

Subject(s)

Drug Discovery , Proteins , Binding Sites , Databases, Protein , Ligands

5.

Optimization of TRPV6 Calcium Channel Inhibitors Using a 3D Ligand-Based Virtual Screening Method.

Simonin, Céline; Awale, Mahendra; Brand, Michael; van Deursen, Ruud; Schwartz, Julian; Fine, Michael; Kovacs, Gergely; Häfliger, Pascal; Gyimesi, Gergely; Sithampari, Abilashan; Charles, Roch-Philippe; Hediger, Matthias A; Reymond, Jean-Louis.

Angew Chem Int Ed Engl ; 54(49): 14748-52, 2015 Dec 01.

Article in English | MEDLINE | ID: mdl-26457814

ABSTRACT

Herein, we report the discovery of the first potent and selective inhibitor of TRPV6, a calcium channel overexpressed in breast and prostate cancer, and its use to test the effect of blocking TRPV6-mediated Ca(2+)-influx on cell growth. The inhibitor was discovered through a computational method, xLOS, a 3D-shape and pharmacophore similarity algorithm, a type of ligand-based virtual screening (LBVS) method described briefly here. Starting with a single weakly active seed molecule, two successive rounds of LBVS followed by optimization by chemical synthesis led to a selective molecule with 0.3âµM inhibition of TRPV6. The ability of xLOS to identify different scaffolds early in LBVS was essential to success. The xLOS method may be generally useful to develop tool compounds for poorly characterized targets.

Subject(s)

Antineoplastic Agents/pharmacology , Calcium Channel Blockers/pharmacology , Drug Evaluation, Preclinical/methods , TRPV Cation Channels/antagonists & inhibitors , Antineoplastic Agents/chemical synthesis , Antineoplastic Agents/chemistry , Calcium Channel Blockers/chemical synthesis , Calcium Channel Blockers/chemistry , Calcium Channels/biosynthesis , Cell Line, Tumor , Cell Proliferation/drug effects , Dose-Response Relationship, Drug , Drug Screening Assays, Antitumor , Humans , Ligands , Molecular Structure , Structure-Activity Relationship , TRPV Cation Channels/biosynthesis

6.

MQN-mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13.

Awale, Mahendra; van Deursen, Ruud; Reymond, Jean-Louis.

J Chem Inf Model ; 53(2): 509-18, 2013 Feb 25.

Article in English | MEDLINE | ID: mdl-23297797

ABSTRACT

The MQN-mapplet is a Java application giving access to the structure of small molecules in large databases via color-coded maps of their chemical space. These maps are projections from a 42-dimensional property space defined by 42 integer value descriptors called molecular quantum numbers (MQN), which count different categories of atoms, bonds, polar groups, and topological features and categorize molecules by size, rigidity, and polarity. Despite its simplicity, MQN-space is relevant to biological activities. The MQN-mapplet allows localization of any molecule on the color-coded images, visualization of the molecules, and identification of analogs as neighbors on the MQN-map or in the original 42-dimensional MQN-space. No query molecule is necessary to start the exploration, which may be particularly attractive for nonchemists. To our knowledge, this type of interactive exploration tool is unprecedented for very large databases such as PubChem and GDB-13 (almost one billion molecules). The application is freely available for download at www.gdb.unibe.ch.

Subject(s)

Databases, Chemical , Databases, Pharmaceutical , Small Molecule Libraries/chemistry , Ligands , Models, Molecular , Software

7.

Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17.

Ruddigkeit, Lars; van Deursen, Ruud; Blum, Lorenz C; Reymond, Jean-Louis.

J Chem Inf Model ; 52(11): 2864-75, 2012 Nov 26.

Article in English | MEDLINE | ID: mdl-23088335

ABSTRACT

Drug molecules consist of a few tens of atoms connected by covalent bonds. How many such molecules are possible in total and what is their structure? This question is of pressing interest in medicinal chemistry to help solve the problems of drug potency, selectivity, and toxicity and reduce attrition rates by pointing to new molecular series. To better define the unknown chemical space, we have enumerated 166.4 billion molecules of up to 17 atoms of C, N, O, S, and halogens forming the chemical universe database GDB-17, covering a size range containing many drugs and typical for lead compounds. GDB-17 contains millions of isomers of known drugs, including analogs with high shape similarity to the parent drug. Compared to known molecules in PubChem, GDB-17 molecules are much richer in nonaromatic heterocycles, quaternary centers, and stereoisomers, densely populate the third dimension in shape space, and represent many more scaffold types.

Subject(s)

Drug Design , Models, Chemical , Prescription Drugs/chemistry , Computer Simulation , Databases, Chemical , Databases, Pharmaceutical , Molecular Structure , Stereoisomerism , Structure-Activity Relationship

8.

Discovery of α7-nicotinic receptor ligands by virtual screening of the chemical universe database GDB-13.

Blum, Lorenz C; van Deursen, Ruud; Bertrand, Sonia; Mayer, Milena; Bürgi, Justus J; Bertrand, Daniel; Reymond, Jean-Louis.

J Chem Inf Model ; 51(12): 3105-12, 2011 Dec 27.

Article in English | MEDLINE | ID: mdl-22077916

ABSTRACT

The chemical universe database GDB-13 enumerates 977 million organic molecules up to 13 atoms of C, N, O, Cl, and S that are virtually possible following simple rules for chemical stability and synthetic feasibility. Analogs of nicotine were identified in GDB-13 using the city-block distance in MQN-space (CBD(MQN)) as a similarity measure, combined with a restriction eliminating problematic structural elements. The search was carried out with a Web browser available at www.gdb.unibe.ch . This virtual screening procedure selected 31 504 analogs of nicotine from GDB-13, from which 48 were known nicotinic ligands reported in Chembl. An additional 60 virtual screening hits were purchased and tested for modulation of the acetylcholine signal at the human α7 nAChR expressed in Xenopus oocytes, which led to the identification of three previously unknown inhibitors. These experiments demonstrate for the first time the use of GDB-13 for ligand discovery.

Subject(s)

Drug Discovery , Receptors, Nicotinic/metabolism , Small Molecule Libraries/chemistry , Small Molecule Libraries/pharmacology , Animals , Databases, Factual , Gene Expression , Humans , Ligands , Molecular Motor Proteins , Receptors, Nicotinic/genetics , Xenopus , alpha7 Nicotinic Acetylcholine Receptor

9.

What we have learned from crystal structures of proteins to receptor function.

Reymond, J-L; van Deursen, Ruud; Bertrand, D.

Biochem Pharmacol ; 82(11): 1521-7, 2011 Dec 01.

Article in English | MEDLINE | ID: mdl-21787757

ABSTRACT

The activity of ligand gated channels is crucial for proper brain function and dysfunction of a single receptor subtype have led to neurological impairments ranging from benign to major diseases such as epilepsy, startle diseases, etc. Molecular biology and crystallography allowed the characterization at the atomic scale of the first four transmembrane ligand gated channels and of proteins sharing a high degree of homology with the neurotransmitter-binding domain. Gaining an adequate knowledge of the structural features of the ligand binding pocket led to the possibilities of developing virtual screening based approaches and probing in silico the docking of very large numbers of molecules. Development of new computing tools further extended such possibilities and rendered possible the screening of the chemical universe database GDB-11, which contains all possible organic molecules up to 11 atoms of C, N, O and F. In the case of the nicotinic acetylcholine receptors molecules identified using such screening methods were synthesized and characterized in binding assays and their pose determined in crystal structure with the acetylcholine binding protein. However, in spite of these thorough approaches, functional studies revealed that these molecules had a greater affinity for the pore domain of the channel and acted as open channel blocker rather than binding site antagonist. In this work, we discuss the potential and current limitations of how progresses made with the crystal structures of ligand gated channels, or ligand binding proteins, can be used in combination with virtual screening and functional assays, to identify novel compounds.

Subject(s)

Ligand-Gated Ion Channels/chemistry , Models, Molecular , Receptors, Cell Surface/chemistry , Allosteric Regulation , Animals , Computer Simulation , Crystallography, X-Ray , Drug Design , Evoked Potentials , High-Throughput Screening Assays , Humans , Ligands , Pharmaceutical Preparations/chemistry , Protein Conformation

10.

Visualisation of the chemical space of fragments, lead-like and drug-like molecules in PubChem.

van Deursen, Ruud; Blum, Lorenz C; Reymond, Jean-Louis.

J Comput Aided Mol Des ; 25(7): 649-62, 2011 Jul.

Article in English | MEDLINE | ID: mdl-21618008

ABSTRACT

The 4.5 million organic molecules with up to 20 non-hydrogen atoms in PubChem were analyzed using the MQN-system, which consists in 42 integer value descriptors of molecular structure. The 42-dimensional MQN-space was visualised by principal component analysis and representation of the (PC1, PC2), (PC1, PC3) and (PC2, PC3) planes. The molecules were organized according to ring count (PC1, 38% of variance), the molecular size (PC2, 25% of variance), and the H-bond acceptor count (PC3, 12% of variance). Compounds following Lipinski's bioavailability, Oprea's lead-likeness and Congreve's fragment-likeness criteria formed separated groups in MQN-space visible in the (PC2, PC3) plane. MQN-similarity searches of the 4.5 million molecules (see the browser available at www.gdb.unibe.ch ) gave significant enrichment factors for recovering groups of fragment-sized bioactive compounds related to ten different biological targets taken from Chembl, allowing lead-hopping relationships not seen with substructure fingerprint similarity searches. The diversity of different compound series was analyzed by MQN-distance histograms.

Subject(s)

Databases, Factual/classification , Drug Discovery , Informatics , Peptide Fragments/chemistry , Pharmaceutical Preparations/chemistry , Combinatorial Chemistry Techniques , Humans , Ligands , Small Molecule Libraries/chemistry

11.

Visualisation and subsets of the chemical universe database GDB-13 for virtual screening.

Blum, Lorenz C; van Deursen, Ruud; Reymond, Jean-Louis.

J Comput Aided Mol Des ; 25(7): 637-47, 2011 Jul.

Article in English | MEDLINE | ID: mdl-21618009

ABSTRACT

The chemical universe database GDB-13, which enumerates 977 million organic molecules up to 13 atoms of C, N, O, S and Cl following simple chemical stability and synthetic feasibility rules, represents a vast reservoir for new fragments. GDB-13 was classified using the MQN-system discussed in the preceding paper for the analysis of PubChem fragments. Two hundred and fifty-five subsets of GDB-13 were generated by the combinatorial use of eight restrictive criteria, including fragment-like ("rule of three") and scaffold-like (no acyclic carbon atoms) filters. Virtual screening for analogs of 15 commercial drugs of 13 non-hydrogen atoms or less shows that retrieving MQN-neighbors of a query molecule from GDB-13 or its subsets provides on average a 38-fold enrichment in structural analogs (Daylight-type substructure fingerprint Tanimoto T (SF) > 0.7), and a 75-fold enrichment in shape-similar analogs (ROCS TanimotoCombo score > 1.4). An MQN-searchable version of GDB-13 is provided at www.gdb.unibe.ch .

Subject(s)

Databases, Factual/classification , Drug Discovery , Informatics , Peptide Fragments/chemistry , Pharmaceutical Preparations/chemistry , Proteins/chemistry , User-Computer Interface , Binding Sites , Combinatorial Chemistry Techniques , Humans , Ligands

12.

Exploring the chemical space of known and unknown organic small molecules at www.gdb.unibe.ch.

Reymond, Jean-Louis; Blum, Lorenz C; van Deursen, Ruud.

Chimia (Aarau) ; 65(11): 863-7, 2011.

Article in English | MEDLINE | ID: mdl-22289373

ABSTRACT

Organic small molecules are of particular interest for medicinal chemistry since they comprise many biologically active compounds which are potential drugs. To understand this vast chemical space, we are enumerating all possible organic molecules to create the chemical universe database GDB, which currently comprises 977 million molecules up to 13 atoms of C, N, O, Cl and S. Furthermore, we have established a simple classification method for organic molecules in form of the MQN (molecular quantum numbers) system, which is an equivalent of the periodic system of the elements. Despite its simplicity the 42 dimensional MQN system is surprisingly relevant with respect to bioactivity, as evidenced by the fact that groups of biosimilar compounds form close groups in MQN space. The MQN space of the known organic molecules in PubChem and of the unknown molecules in the Chemical Universe Database GDB-13 can be searched interactively using browser tools freely accessible at www.gdb.unibe.ch.

Subject(s)

Organic Chemicals/chemistry , Databases, Factual , Internet

13.

A searchable map of PubChem.

van Deursen, Ruud; Blum, Lorenz C; Reymond, Jean-Louis.

J Chem Inf Model ; 50(11): 1924-34, 2010 Nov 22.

Article in English | MEDLINE | ID: mdl-20945869

ABSTRACT

The database PubChem was classified using 42 integer value descriptors of molecular structure, here called molecular quantum numbers (MQNs), which count atoms and bond types, polar groups, and topological features. Principal component analysis of the MQN data set shows that PubChem compounds occupy a partially filled elliptical cone in the (PC1,PC2,PC3) space whose axis is the first principal component PC1 (65% variability) representing molecular size, and the ellipse axes are PC2 (18% variability, representing structural flexibility) and PC3 (7% variability, representing polarity). A visual overview of PubChem is provided by color-coded representations of the (PC2,PC3) plane. The MQNs form a scalar fingerprint which can be used to measure the similarity between pairs of molecules and enable ligand-based virtual screening, as illustrated for the enrichment of bioactives from the DUD data set from PubChem. An MQN-annotated version of PubChem with an MQN-similarity search tool is available at www.gdb.unibe.ch .

Subject(s)

Data Mining/methods , Databases, Factual/classification , Computer Graphics , Drug Evaluation, Preclinical , User-Computer Interface

14.

Exploring α7-Nicotinic Receptor Ligand Diversity by Scaffold Enumeration from the Chemical Universe Database GDB.

Garcia-Delgado, Noemi; Bertrand, Sonia; Nguyen, Kong T; van Deursen, Ruud; Bertrand, Daniel; Reymond, Jean-Louis.

ACS Med Chem Lett ; 1(8): 422-6, 2010 Nov 11.

Article in English | MEDLINE | ID: mdl-24900227

ABSTRACT

Virtual analogues (1167860 compounds) of the nicotinic α7-receptor (α7 nAChR) ligands PNU-282,987 and SSR180711 were generated from the chemical universe database GDB-11 by extracting all aliphatic diamine analogues of the aminoquinuclidine and 1,4-diazabicyclo[3.2.2]nonane scaffolds of these ligands and converting them to the corresponding aryl amides using five different aromatic acyl groups. The library was ranked by docking to the nicotinic binding site of the acetylcholine binding protein (AChBP, 1UW6.pdb) using Autodock and Glide. Thirty-eight ligands derived from the best docking hits were synthesized and tested for modulation of the acetylcholine signal at the human α7 nAChR receptor expressed in Xenopus oocytes, leading to competitive and noncompetitive antagonists with IC50 = 5-7 µM. These experiments demonstrate the first example of using GDB in a fragment-based approach by diversifying the scaffold of known drugs.

15.

Classification of organic molecules by molecular quantum numbers.

Nguyen, Kong T; Blum, Lorenz C; van Deursen, Ruud; Reymond, Jean-Louis.

ChemMedChem ; 4(11): 1803-5, 2009 Nov.

Article in English | MEDLINE | ID: mdl-19774591

Subject(s)

Organic Chemicals/classification , Quantum Theory , Zinc/chemistry , Databases, Factual , Organic Chemicals/chemistry , Principal Component Analysis

16.

Chemical space travel.

van Deursen, Ruud; Reymond, Jean-Louis.

ChemMedChem ; 2(5): 636-40, 2007 May.

Article in English | MEDLINE | ID: mdl-17366512

Subject(s)

Drug Design , Molecular Structure , Mutation

17.

A putative consensus sequence for the nucleotide-binding site of annexin A6.

Bandorowicz-Pikula, Joanna; Kirilenko, Aneta; van Deursen, Ruud; Golczak, Marcin; Kühnel, Michael; Lancelin, Jean-Marc; Pikula, Slawomir; Buchet, René.

Biochemistry ; 42(30): 9137-46, 2003 Aug 05.

Article in English | MEDLINE | ID: mdl-12885247

ABSTRACT

Reaction-induced infrared difference spectroscopy (RIDS) has been used to investigate the nature of interactions of human annexin A6 (ANXA6) with nucleotides. RIDS results for ANXA6, obtained after the photorelease of GTP-gamma-S, ATP, or P(i) from the respective caged compounds, were identical, suggesting that the interactions between the nucleotide and ANXA6 were dominated by the phosphate groups. Phosphate-induced structural changes in ANXA6 were small and affected only seven or eight amino acid residues. The GTP fluorescent analogue, 2'(3')-O-(2,4,6-trinitrophenyl)guanosine 5'-triphosphate (TNP-GTP), quenched tryptophan fluorescence of ANXA6 when bound to the protein. A binding stoichiometry of 1 mol of nucleotide/mol ANXA6 was established with a K(D) value of 2.8 microM for TNP-GTP. The bands observed on RIDS of ANXA6 halves (e.g., N-terminal half, ANXA6a, and C-terminal half, ANXA6b) were similar to those of the whole molecule. However, their amplitudes were smaller by a factor of 2 compared to those of whole ANXA6. TNP-GTP bound to both fragments of ANXA6 with a stoichiometry of 0.5 mol/mol. However, the binding affinities of ANXA6a and ANXA6b differed from that of ANXA6. Simulated molecular modeling revealed a nucleotide-binding site which was distributed in two distinct domains. Residues K296, Y297, K598, and K644 of ANXA6 were less than 3 A from the bound phosphate groups of either GTP or ATP. The presence of two identical sequences in ANXA6 with the F-X-X-K-Y-D/E-K-S-L motif, located in the middle of ANXA6, at residues 293-301 (within ANXA6a) and at 641-649 (within ANXA6b), suggested that the F-X-X-K-Y-D/E-K-S-L motif was the putative sequence in ANXA6 for nucleotide binding.

Subject(s)

Adenosine Triphosphate/metabolism , Annexin A6/chemistry , Annexin A6/metabolism , Consensus Sequence , Guanosine Triphosphate/analogs & derivatives , Guanosine Triphosphate/metabolism , Adenosine Triphosphate/chemistry , Amino Acid Sequence , Binding Sites , Computer Simulation , Crystallography, X-Ray , Guanosine 5'-O-(3-Thiotriphosphate)/chemistry , Guanosine 5'-O-(3-Thiotriphosphate)/metabolism , Guanosine Triphosphate/chemistry , Humans , Models, Molecular , Molecular Sequence Data , Phosphates/chemistry , Phosphates/metabolism , Protein Isoforms/chemistry , Protein Isoforms/metabolism , Protein Structure, Tertiary , Recombinant Proteins/chemistry , Recombinant Proteins/metabolism , Spectrometry, Fluorescence , Spectroscopy, Fourier Transform Infrared , Thionucleotides/chemistry , Thionucleotides/metabolism

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL