Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Am Soc Mass Spectrom ; 34(8): 1584-1592, 2023 Aug 02.
Artigo em Inglês | MEDLINE | ID: mdl-37390315

RESUMO

During the past decade promising methods for computational prediction of electron ionization mass spectra have been developed. The most prominent ones are based on quantum chemistry (QCEIMS) and machine learning (CFM-EI, NEIMS). Here we provide a threefold comparison of these methods with respect to spectral prediction and compound identification. We found that there is no unambiguous way to determine the best of these three methods. Among other factors, we find that the choice of spectral distance functions play an important role regarding the performance for compound identification.

2.
Chem Commun (Camb) ; 59(45): 6865-6868, 2023 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-37195424

RESUMO

We report the co-polymerization of glycol nucleic acid (GNA) monomers with unsubstituted and substituted dicarboxylic acid linkers under plausible early Earth aqueous dry-down conditions. Both linear and branched co-polymers are produced. Mechanistic aspects of the reaction and potential roles of these polymers in prebiotic chemistry are discussed.

3.
J Mol Evol ; 90(3-4): 307-323, 2022 08.
Artigo em Inglês | MEDLINE | ID: mdl-35666290

RESUMO

Recent findings, in vitro and in silico, are strengthening the idea of a simpler, earlier stage of genetically encoded proteins which used amino acids produced by prebiotic chemistry. These findings motivate a re-examination of prior work which has identified unusual properties of the set of twenty amino acids found within the full genetic code, while leaving it unclear whether similar patterns also characterize the subset of prebiotically plausible amino acids. We have suggested previously that this ambiguity may result from the low number of amino acids recognized by the definition of prebiotic plausibility used for the analysis. Here, we test this hypothesis using significantly updated data for organic material detected within meteorites, which contain several coded and non-coded amino acids absent from prior studies. In addition to confirming the well-established idea that "late" arriving amino acids expanded the chemistry space encoded by genetic material, we find that a prebiotically plausible subset of coded amino acids generally emulates the patterns found in the full set of 20, namely an exceptionally broad and even distribution of volumes and an exceptionally even distribution of hydrophobicities (quantified as logP) over a narrow range. However, the strength of this pattern varies depending on both the size and composition the library used to create a background (null model) for a random alphabet, and the precise definition of exactly which amino acids were present in a simpler, earlier code. Findings support the idea that a small sample size of amino acids caused previous ambiguous results, and further improvements in meteorite analysis, and/or prebiotic simulations will further clarify the nature and extent of unusual properties. We discuss the case of sulfur-containing amino acids as a specific and clear example and conclude by reviewing the potential impact of better understanding the chemical "logic" of a smaller forerunner to the standard amino acid alphabet.


Assuntos
Aminoácidos , Proteínas , Aminoácidos/química , Aminoácidos/genética , Código Genético , Humanos , Proteínas/química , Proteínas/genética
4.
Chem Sci ; 13(17): 4838-4853, 2022 May 04.
Artigo em Inglês | MEDLINE | ID: mdl-35655880

RESUMO

A central question in origins of life research is how non-entailed chemical processes, which simply dissipate chemical energy because they can do so due to immediate reaction kinetics and thermodynamics, enabled the origin of highly-entailed ones, in which concatenated kinetically and thermodynamically favorable processes enhanced some processes over others. Some degree of molecular complexity likely had to be supplied by environmental processes to produce entailed self-replicating processes. The origin of entailment, therefore, must connect to fundamental chemistry that builds molecular complexity. We present here an open-source chemoinformatic workflow to model abiological chemistry to discover such entailment. This pipeline automates generation of chemical reaction networks and their analysis to discover novel compounds and autocatalytic processes. We demonstrate this pipeline's capabilities against a well-studied model system by vetting it against experimental data. This workflow can enable rapid identification of products of complex chemistries and their underlying synthetic relationships to help identify autocatalysis, and potentially self-organization, in such systems. The algorithms used in this study are open-source and reconfigurable by other user-developed workflows.

6.
J Chem Inf Model ; 59(10): 4266-4277, 2019 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-31498614

RESUMO

Biology encodes hereditary information in DNA and RNA, which are finely tuned to their biological functions and modes of biological production. The central role of nucleic acids in biological information flow makes them key targets of pharmaceutical research. Indeed, other nucleic acid-like polymers can play similar roles to natural nucleic acids both in vivo and in vitro; yet despite remarkable advances over the last few decades, much remains unknown regarding which structures are compatible with molecular information storage. Chemical space describes the structures and properties of molecules that could exist within a given molecular formula or other classification system. Using structure generation methods, we explore nucleic acid analogues within the formula ranges BC3-7H5-15O2-4 and BC3-6H5-15N1-2O0-4, where B is a recognition element (e.g., a nucleobase). Other restrictions included two obligatory points of attachment for inclusion into a linear polymer and substructures predicting chemical stability. These sets contain 86,007 (CHO) and 75,309 (CHNO) compositionally isomeric structures, representing 706,568 CHO and 454,422 CHNO stereoisomers, that diversely and densely occupy this space. These libraries point toward there being large spaces of unexplored chemistry relevant to pharmacology and biochemistry and efforts to understand the origins of life.


Assuntos
Bases de Dados de Ácidos Nucleicos , Ácidos Nucleicos/química , Bibliotecas de Moléculas Pequenas , Quimioinformática , Descoberta de Drogas , Conformação de Ácido Nucleico
7.
Sci Rep ; 9(1): 12468, 2019 08 28.
Artigo em Inglês | MEDLINE | ID: mdl-31462646

RESUMO

Life uses a common set of 20 coded amino acids (CAAs) to construct proteins. This set was likely canonicalized during early evolution; before this, smaller amino acid sets were gradually expanded as new synthetic, proofreading and coding mechanisms became biologically available. Many possible subsets of the modern CAAs or other presently uncoded amino acids could have comprised the earlier sets. We explore the hypothesis that the CAAs were selectively fixed due to their unique adaptive chemical properties, which facilitate folding, catalysis, and solubility of proteins, and gave adaptive value to organisms able to encode them. Specifically, we studied in silico hypothetical CAA sets of 3-19 amino acids comprised of 1913 structurally diverse α-amino acids, exploring the adaptive value of their combined physicochemical properties relative to those of the modern CAA set. We find that even hypothetical sets containing modern CAA members are especially adaptive; it is difficult to find sets even among a large choice of alternatives that cover the chemical property space more amply. These results suggest that each time a CAA was discovered and embedded during evolution, it provided an adaptive value unusual among many alternatives, and each selective step may have helped bootstrap the developing set to include still more CAAs.


Assuntos
Aminoácidos/química , Evolução Molecular , Modelos Químicos , Dobramento de Proteína , Proteínas/química , Proteínas/genética
8.
Faraday Discuss ; 218(0): 9-28, 2019 08 15.
Artigo em Inglês | MEDLINE | ID: mdl-31317165

RESUMO

Understanding complex (bio/geo)systems is a pivotal challenge in modern sciences that fuels a constant development of modern analytical technology, finding innovative solutions to resolve and analyse. In this introductory paper to the Faraday Discussion "Challenges in the analysis of complex natural systems", we aim to present concepts of complexity, and complex chemistry in systems subjected to biotic and abiotic transformations, and introduce the analytical possibilities to disentangle chemical complexity into its elementary parts (i.e. compositional and structural resolution) as a global integrated approach termed systems chemical analytics.

9.
Sci Rep ; 7(1): 17540, 2017 12 13.
Artigo em Inglês | MEDLINE | ID: mdl-29235498

RESUMO

The reverse tricarboxylic acid (rTCA) cycle has been explored from various standpoints as an idealized primordial metabolic cycle. Its simplicity and apparent ubiquity in diverse organisms across the tree of life have been used to argue for its antiquity and its optimality. In 2000 it was proposed that chemoinformatics approaches support some of these views. Specifically, defined queries of the Beilstein database showed that the molecules of the rTCA are heavily represented in such compound databases. We explore here the chemical structure "space," e.g. the set of organic compounds which possesses some minimal set of defining characteristics, of the rTCA cycle's intermediates using an exhaustive structure generation method. The rTCA's chemical space as defined by the original criteria and explored by our method is some six to seven times larger than originally considered. Acknowledging that each assumption in what is a defining criterion making the rTCA cycle special limits possible generative outcomes, there are many unrealized compounds which fulfill these criteria. That these compounds are unrealized could be due to evolutionary frozen accidents or optimization, though this optimization may also be for systems-level reasons, e.g., the way the pathway and its elements interface with other aspects of metabolism.


Assuntos
Ciclo do Ácido Cítrico , Simulação por Computador , Modelos Moleculares , Biologia Computacional , Estereoisomerismo , Ácidos Tricarboxílicos/química
10.
Philos Trans A Math Phys Eng Sci ; 375(2109)2017 Dec 28.
Artigo em Inglês | MEDLINE | ID: mdl-29133444

RESUMO

The origin of life is typically understood as a transition from inanimate or disorganized matter to self-organized, 'animate' matter. This transition probably took place largely in the context of organic compounds, and most approaches, to date, have focused on using the organic chemical composition of modern organisms as the main guide for understanding this process. However, it has gradually come to be appreciated that biochemistry, as we know it, occupies a minute volume of the possible organic 'chemical space'. As the majority of abiotic syntheses appear to make a large set of compounds not found in biochemistry, as well as an incomplete subset of those that are, it is possible that life began with a significantly different set of components. Chemical graph-based structure generation methods allow for exhaustive in silico enumeration of different compound types and different types of 'chemical spaces' beyond those used by biochemistry, which can be explored to help understand the types of compounds biology uses, as well as to understand the nature of abiotic synthesis, and potentially design novel types of living systems.This article is part of the themed issue 'Reconceptualizing the origins of life'.


Assuntos
Simulação por Computador , Exobiologia , Vida , Aminoácidos/metabolismo , Origem da Vida
11.
Astrobiology ; 15(7): 538-58, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26200431

RESUMO

Ribonucleic acid (RNA) is one of the two nucleic acids used by extant biochemistry and plays a central role as the intermediary carrier of genetic information in transcription and translation. If RNA was involved in the origin of life, it should have a facile prebiotic synthesis. A wide variety of such syntheses have been explored. However, to date no one-pot reaction has been shown capable of yielding RNA monomers from likely prebiotically abundant starting materials, though this does not rule out the possibility that simpler, more easily prebiotically accessible nucleic acids may have preceded RNA. Given structural constraints, such as the ability to form complementary base pairs and a linear covalent polymer, a variety of structural isomers of RNA could potentially function as genetic platforms. By using structure-generation software, all the potential structural isomers of the ribosides (BC5H9O4, where B is nucleobase), as well as a set of simpler minimal analogues derived from them, that can potentially serve as monomeric building blocks of nucleic acid-like molecules are enumerated. Molecules are selected based on their likely stability under biochemically relevant conditions (e.g., moderate pH and temperature) and the presence of at least two functional groups allowing the monomers to be incorporated into linear polymers. The resulting structures are then evaluated by using molecular descriptors typically applied in quantitative structure-property relationship (QSPR) studies and predicted physicochemical properties. Several databases have been queried to determine whether any of the computed isomers had been synthesized previously. Very few of the molecules that emerge from this structure set have been previously described. We conclude that ribonucleosides may have competed with a multitude of alternative structures whose potential proto-biochemical roles and abiotic syntheses remain to be explored.


Assuntos
RNA/química , Ribonucleosídeos/química , Fenômenos Químicos , Simulação por Computador , Isomerismo , Origem da Vida , Polimerização , Relação Quantitativa Estrutura-Atividade , RNA/síntese química
12.
Sci Rep ; 5: 9414, 2015 Mar 24.
Artigo em Inglês | MEDLINE | ID: mdl-25802223

RESUMO

Using novel advances in computational chemistry, we demonstrate that the set of 20 genetically encoded amino acids, used nearly universally to construct all coded terrestrial proteins, has been highly influenced by natural selection. We defined an adaptive set of amino acids as one whose members thoroughly cover relevant physico-chemical properties, or "chemistry space." Using this metric, we compared the encoded amino acid alphabet to random sets of amino acids. These random sets were drawn from a computationally generated compound library containing 1913 alternative amino acids that lie within the molecular weight range of the encoded amino acids. Sets that cover chemistry space better than the genetically encoded alphabet are extremely rare and energetically costly. Further analysis of more adaptive sets reveals common features and anomalies, and we explore their implications for synthetic biology. We present these computations as evidence that the set of 20 amino acids found within the standard genetic code is the result of considerable natural selection. The amino acids used for constructing coded proteins may represent a largely global optimum, such that any aqueous biochemistry would use a very similar set.


Assuntos
Adaptação Biológica/genética , Aminoácidos/genética , Códon , Aminoácidos/química , Seleção Genética
13.
J Chem Inf Model ; 53(11): 2851-62, 2013 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-24152173

RESUMO

α-Amino acids are fundamental to biochemistry as the monomeric building blocks with which cells construct proteins according to genetic instructions. However, the 20 amino acids of the standard genetic code represent a tiny fraction of the number of α-amino acid chemical structures that could plausibly play such a role, both from the perspective of natural processes by which life emerged and evolved, and from the perspective of human-engineered genetically coded proteins. Until now, efforts to describe the structures comprising this broader set, or even estimate their number, have been hampered by the complex combinatorial properties of organic molecules. Here, we use computer software based on graph theory and constructive combinatorics in order to conduct an efficient and exhaustive search of the chemical structures implied by two careful and precise definitions of the α-amino acids relevant to coded biological proteins. Our results include two virtual libraries of α-amino acid structures corresponding to these different approaches, comprising 121 044 and 3 846 structures, respectively, and suggest a simple approach to exploring much larger, as yet uncomputed, libraries of interest.


Assuntos
Aminoácidos/química , Evolução Molecular , Proteínas/química , Software , Algoritmos , Técnicas de Química Combinatória , Engenharia Genética , Humanos , Estereoisomerismo
14.
Metabolites ; 3(2): 440-62, 2013 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-24958000

RESUMO

This paper details the MOLGEN entries for the 2012 CASMI contest for small molecule identification to demonstrate structure elucidation using structure generation approaches. Different MOLGEN programs were used for different categories, including MOLGEN-MS/MS for Category 1, MOLGEN 3.5 and 5.0 for Category 2 and MOLGEN-MS for Categories 3 and 4. A greater focus is given to Categories 1 and 2, as most CASMI participants entered these categories. The settings used and the reasons behind them are described in detail, while various evaluations are used to put these results into perspective. As one author was also an organiser of CASMI, these submissions were not part of the official CASMI competition, but this paper provides an insight into how unknown identification could be performed using structure generation approaches. The approaches are semi-automated (category dependent) and benefit greatly from user experience. Thus, the results presented and discussed here may be better than those an inexperienced user could obtain with MOLGEN programs.

15.
Anal Chem ; 84(7): 3287-95, 2012 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-22414024

RESUMO

This article explores consensus structure elucidation on the basis of GC/EI-MS, structure generation, and calculated properties for unknown compounds. Candidate structures were generated using the molecular formula and substructure information obtained from GC/EI-MS spectra. Calculated properties were then used to score candidates according to a consensus approach, rather than filtering or exclusion. Two mass spectral match calculations (MOLGEN-MS and MetFrag), retention behavior (Lee retention index/boiling point correlation, NIST Kovat's retention index), octanol-water partitioning behavior (log K(ow)), and finally steric energy calculations were used to select candidates. A simple consensus scoring function was developed and tested on two unknown spectra detected in a mutagenic subfraction of a water sample from the Elbe River using GC/EI-MS. The top candidates proposed using the consensus scoring technique were purchased and confirmed analytically using GC/EI-MS and LC/MS/MS. Although the compounds identified were not responsible for the sample mutagenicity, the structure-generation-based identification for GC/EI-MS using calculated properties and consensus scoring was demonstrated to be applicable to real-world unknowns and suggests that the development of a similar strategy for multidimensional high-resolution MS could improve the outcomes of environmental and metabolomics studies.


Assuntos
Cromatografia Gasosa-Espectrometria de Massas/métodos , Informática/métodos , Poluentes Ambientais/análise , Poluentes Ambientais/química , Poluentes Ambientais/toxicidade , Conformação Molecular , Termodinâmica
16.
Anal Chem ; 83(3): 903-12, 2011 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-21226466

RESUMO

The identification of unknown compounds based on GC/EI-MS spectrum and structure generation techniques has been improved by combining a number of strategies into a programmed sequence. The program MOLGEN-MS is used to determine the molecular formula and incorporate substructural information to generate all structures matching the mass spectral information. Mass spectral fragments are then predicted for each structure and compared with the experimental spectrum using a match value. Additional data are then calculated automatically for each candidate to allow exclusion of candidates that did not match other analytical information. The effectiveness of these "exclusion criteria", as well as the programming sequence, was tested using a case study of 29 isomers of formula C(12)H(10)O(2). The default classifier precision resulted in the generation of too many structures in some cases, which was improved by up to several orders of magnitude by including additional classifiers or restrictions. Combining this with the exclusion of candidates based on a Lee retention index/boiling point correlation, octanol-water partitioning coefficients, steric energies, and finally spectral match values limited the number of candidate structures further from over 1 billion without any restrictions down to less than 6 structures in 10 cases and below 35 in all but 3 cases. This method can be used in the absence of matching database spectra and brings unknown identification based on MS interpretation and structure generation techniques a step closer to practical reality.

17.
Anal Chem ; 81(9): 3608-17, 2009 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-19323534

RESUMO

Three programs were assessed for their ability to predict mass spectral fragmentation patterns for all constitutional isomers of an experimental low-resolution electron impact mass spectrum (EI-MS), given the molecular formula, and use this information to identify the "correct structure". MOLGEN 3.5 was used to generate the structures, while all spectra were extracted from the NIST database. The commercial programs Mass Frontier and ACD MS Manager, as well as MOLGEN-MSF (developed by the University of Bayreuth) were used to generate mass spectral fragments. MOLGEN-MSF was used to generate "match values" to compare the different programs and their ability to identify the "correct structure". Although high match values could be achieved with certain settings, the ranking of the correct structure relative to other constitutional isomers was not significantly better than the results published previously and in some cases significantly worse. Furthermore, all programs showed bias toward specific structures, which changed significantly with minor changes to the program settings. Thus, advances in mass spectral fragment prediction have not necessarily improved computer aided structure elucidation (CASE) from EI-MS and indicate that caution must be used when confirming the identity of a compound only based on the match between its predicted fragments and the mass spectrum.

18.
J Chem Inf Model ; 47(6): 2345-57, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17880194

RESUMO

y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.


Assuntos
Relação Quantitativa Estrutura-Atividade , Coleta de Dados , Software
19.
J Chem Inf Model ; 47(3): 805-17, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17532665

RESUMO

A general mathematical description, mostly in terms of graph theory, is given for reactions of organic chemistry. The corresponding computer program generates all products that can result from a given set of starting materials interacting according to a given set of reaction schemes. Example reactions from combinatorial chemistry, synthetic organic chemistry, and mass spectroscopic structure elucidation are considered in detail.

20.
Bioorg Med Chem ; 14(15): 5178-95, 2006 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-16650995

RESUMO

Multilinear QSAR models are developed for the largest and most diverse set of PPARgamma agonists treated hitherto. Binding of these small molecules to the human nuclear receptor PPARgamma is described by models that are built on simple 2D molecular descriptors and nevertheless are of good quality and predictive power (e.g., 144 compounds, 10 descriptors, r2=0.79, r2(cv)=0.76). The models presented are thoroughly validated by crossvalidation, randomization experiments, bootstrapping, and training set/test set partitioning. They may therefore be helpful in the design of new antidiabetic drug candidates. For gene transactivation, the functional activity of the agonists, a corresponding model for a similarly diverse compound set is of somewhat lower statistical quality.


Assuntos
PPAR gama/agonistas , PPAR gama/genética , Relação Quantitativa Estrutura-Atividade , Sítios de Ligação , Simulação por Computador , Desenho de Fármacos , Ácidos Graxos/química , Ácidos Graxos/farmacologia , Humanos , Indóis/química , Indóis/farmacologia , Ligantes , Estrutura Molecular , PPAR gama/química , Tiazolidinedionas/química , Tiazolidinedionas/farmacologia , Ativação Transcricional/efeitos dos fármacos , Tirosina/análogos & derivados , Tirosina/química , Tirosina/farmacologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...