Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
Add more filters










Publication year range
1.
BMC Bioinformatics ; 20(1): 19, 2019 Jan 10.
Article in English | MEDLINE | ID: mdl-30630411

ABSTRACT

BACKGROUND: In systems biology, there is an acute need for integrative approaches in heterogeneous network mining in order to exploit the continuous flux of genomic data. Simultaneous analysis of the metabolic pathways and genomic context of a given species leads to the identification of patterns consisting in reaction chains catalyzed by products of neighboring genes. Similar such patterns across several species can reveal their mode of conservation throughout the tree of life. RESULTS: We present CoMetGeNe (COnserved METabolic and GEnomic NEighborhoods), a novel method that identifies metabolic and genomic patterns consisting in maximal trails of reactions being catalyzed by products of neighboring genes. Patterns determined by CoMetGeNe in one species are subsequently employed in order to reflect their degree of conservation across multiple prokaryotic species. These interspecies comparisons help to improve genome annotation and can reveal putative alternative metabolic routes as well as unexpected gene ordering occurrences. CONCLUSIONS: CoMetGeNe is an exploratory tool at both the genomic and the metabolic levels, leading to insights into the conservation of functionally related clusters of neighboring enzyme-coding genes. The open-source CoMetGeNe pipeline is freely available at https://cometgene.lri.fr .


Subject(s)
Bacteria/genetics , Bacteria/metabolism , Computational Biology/methods , Genome, Bacterial , Genomics/methods , Metabolic Networks and Pathways/genetics , Software , Bacteria/classification , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Data Mining , Gene Order , Species Specificity , Systems Biology
2.
BMC Syst Biol ; 7: 99, 2013 Oct 05.
Article in English | MEDLINE | ID: mdl-24093154

ABSTRACT

BACKGROUND: Enzymes belonging to mechanistically diverse superfamilies often display similar catalytic mechanisms. We previously observed such an association in the case of the cyclic amidohydrolase superfamily whose members play a role in related steps of purine and pyrimidine metabolic pathways. To establish a possible link between enzyme homology and chemical similarity, we investigated further the neighbouring steps in the respective pathways. RESULTS: We identified that successive reactions of the purine and pyrimidine pathways display similar chemistry. These mechanistically-related reactions are often catalyzed by homologous enzymes. Detection of series of similar catalysis made by succeeding enzyme families suggested some modularity in the architecture of the central metabolism. Accordingly, we introduce the concept of a reaction module to define at least two successive steps catalyzed by homologous enzymes in pathways alignable by similar chemical reactions. Applying such a concept allowed us to propose new function for misannotated paralogues. In particular, we discovered a putative ureidoglycine carbamoyltransferase (UGTCase) activity. Finally, we present experimental data supporting the conclusion that this UGTCase is likely to be involved in a new route in purine catabolism. CONCLUSIONS: Using the reaction module concept should be of great value. It will help us to trace how the primordial promiscuous enzymes were assembled progressively in functional modules, as the present pathways diverged from ancestral pathways to give birth to the present-day mechanistically diversified superfamilies. In addition, the concept allows the determination of the actual function of misannotated proteins.


Subject(s)
Computational Biology/methods , Metabolic Networks and Pathways , Purines/metabolism , Carboxyl and Carbamoyl Transferases/metabolism , Dihydroorotate Dehydrogenase , Dihydrouracil Dehydrogenase (NADP)/metabolism , Glycine/analogs & derivatives , Glycine/metabolism , Oxidoreductases Acting on CH-CH Group Donors/metabolism , Phylogeny , Urea/analogs & derivatives , Urea/metabolism
3.
J Mol Evol ; 77(3): 70-80, 2013 Sep.
Article in English | MEDLINE | ID: mdl-23979262

ABSTRACT

Dihydroorotases are universal proteins catalyzing the third step of pyrimidine biosynthesis. These zinc metalloenzymes belong to the superfamily of cyclic amidohydrolases, comprising also other enzymes that are involved in degradation of either purines (allantoinases), pyrimidines (dihydropyrimidinases) or hydantoins (hydantoinases). The evolutionary relationships between these mechanistically related enzymes were estimated after designing a method to build an accurate multiple sequence alignment. The amino acid sequences that have been crystallized were used to build a seed alignment. All the remaining homologues were progressively added by aligning their HMM profiles to the seed HMM profile, allowing to obtain a reliable phylogeny of the superfamily. This helped us to propose a new evolutionary classification of dihydroorotases into three major types, while at the same time disentangling an important part of the history of their complex structure-function relationships. Although differing in their substrate specificity, allantoinases, hydantoinases and dihydropyrimidinases are found to be phylogenetically closer to DHOase Type I than the proximity of the three DHOase types to each other. This suggests that the primordial cyclic amidohydrolase was a multifunctional, highly evolvable generalist, with high conformational diversity allowing for promiscuous activities. Then, successive gene duplications allowed resolving the primordial substrate ambiguity in various substrate specificities. The present-day superfamily of cyclic amidohydrolases is the result of the progressive divergence of these ancestral paralogous copies by descent with modification.


Subject(s)
Amidohydrolases/chemistry , Amidohydrolases/classification , Amidohydrolases/genetics , Evolution, Molecular , Amidohydrolases/metabolism , Phylogeny , Pyrimidines/biosynthesis , Sequence Alignment , Substrate Specificity
4.
BMC Genomics ; 11: 81, 2010 Feb 01.
Article in English | MEDLINE | ID: mdl-20122162

ABSTRACT

BACKGROUND: More and more completely sequenced fungal genomes are becoming available and many more sequencing projects are in progress. This deluge of data should improve our knowledge of the various primary and secondary metabolisms of Fungi, including their synthesis of useful compounds such as antibiotics or toxic molecules such as mycotoxins. Functional annotation of many fungal genomes is imperfect, especially of genes encoding enzymes, so we need dedicated tools to analyze their metabolic pathways in depth. DESCRIPTION: FUNGIpath is a new tool built using a two-stage approach. Groups of orthologous proteins predicted using complementary methods of detection were collected in a relational database. Each group was further mapped on to steps in the metabolic pathways published in the public databases KEGG and MetaCyc. As a result, FUNGIpath allows the primary and secondary metabolisms of the different fungal species represented in the database to be compared easily, making it possible to assess the level of specificity of various pathways at different taxonomic distances. It is freely accessible at http://www.fungipath.u-psud.fr. CONCLUSIONS: As more and more fungal genomes are expected to be sequenced during the coming years, FUNGIpath should help progressively to reconstruct the ancestral primary and secondary metabolisms of the main branches of the fungal tree of life and to elucidate the evolution of these ancestral fungal metabolisms to various specific derived metabolisms.


Subject(s)
Computational Biology/methods , Databases, Protein , Genome, Fungal , Metabolic Networks and Pathways , Data Mining , Fungi/genetics
5.
J Mol Evol ; 69(5): 470-80, 2009 Nov.
Article in English | MEDLINE | ID: mdl-19784557

ABSTRACT

Contrary to a widespread opinion, horizontal gene transfer (HGT) between distantly related microorganisms (such as Bacteria and Archaea) has not been demonstrated to occur on a large scale. Except for transfer of mobile elements between closely related organisms, most alleged HGT events reflect phylogenetic discrepancies that can be explained by a variety of artefacts or by the differential loss of paralogous gene copies either originally present in the Last Universal Common Ancestor (LUCA) to the three Domains (a sophisticated, genetically redundant and promiscuous community of protoeukaryotes), or created by duplications having occurred at later times. Besides, (i) there is no experimental evidence for the facile acquisition of foreign DNA between distant taxa and (ii) important biological constraints operate on the phenotypic success of genetic exchange at several levels, including protein-protein interactions involved in metabolic channelling; stable integration and expression of foreign DNA is, therefore, expected to require strong selection. Explaining phylogenetic discrepancies by artefacts or loss of paralogs does not eliminate difficulties in retracing species genealogy but maintains the picture of a universal tree of life, HGT between distant organisms being reduced to a trickle. We illustrate our thesis by the phylogenetic analysis of carbamoyltransferases, a family of paralogous proteins. Among higher eukaryotes HGT appears of limited scope except in asexual organisms. We suggest that meiotic sexuality (a hallmark of eukaryotes) emerged in the genetically redundant and protoeukaryotic LUCA as a molecular identity check providing a defence mechanism against the deleterious effects of HGT.


Subject(s)
Archaea/genetics , Bacteria/genetics , Biological Evolution , Eukaryota , Gene Transfer, Horizontal , Meiosis/genetics , Animals , Archaea/cytology , Bacteria/cytology , Carboxyl and Carbamoyl Transferases/genetics , Computational Biology , Eukaryota/cytology , Eukaryota/genetics , Evolution, Molecular , Humans , Phylogeny , Species Specificity
6.
Res Microbiol ; 160(7): 522-8, 2009 Sep.
Article in English | MEDLINE | ID: mdl-19524037

ABSTRACT

A complete tree with roots, trunk and crown remains an appropriate model to represent all steps of life's development, from the emergence of a unique genetic code up to the last universal common ancestor and its further radiation. Catalytic closure of a mixture of prebiotic polymers is a heuristic alternative to the RNA world. Conjectures about emergence of life in an infinite multiverse should not confuse probability with possibility.


Subject(s)
Evolution, Molecular , Origin of Life , Evolution, Chemical , Genetic Code , Models, Biological , RNA/genetics
7.
BMC Bioinformatics ; 9: 536, 2008 Dec 16.
Article in English | MEDLINE | ID: mdl-19087285

ABSTRACT

BACKGROUND: It has been repeatedly observed that gene order is rapidly lost in prokaryotic genomes. However, persistent synteny blocks are found when comparing more or less distant species. These genes that remain consistently adjacent are appealing candidates for the study of genome evolution and a more accurate definition of their functional role. Such studies require visualizing conserved synteny blocks in a large number of genomes at all taxonomic distances. RESULTS: After comparing nearly 600 completely sequenced genomes encompassing the whole prokaryotic tree of life, the computed synteny data were assembled in a relational database, SynteBase. SynteView was designed to visualize conserved synteny blocks in a large number of genomes after choosing one of them as a reference. SynteView functions with data stored either in SynteBase or in a home-made relational database of personal data. In addition, this software can compute on-the-fly and display the distribution of synteny blocks which are conserved in pairs of genomes. This tool has been designed to provide a wealth of information on each positional orthologous gene, to be user-friendly and customizable. It is also possible to download sequences of genes belonging to these synteny blocks for further studies. SynteView is accessible through Java Webstart at http://www.synteview.u-psud.fr. CONCLUSION: SynteBase answers queries about gene order conservation and SynteView visualizes the obtained results in a flexible and powerful way which provides a comparative overview of the conserved synteny in a large number of genomes, whatever their taxonomic distances.


Subject(s)
Gene Order/genetics , Genome, Archaeal , Genome, Bacterial , Software , Synteny/genetics , Computational Biology/methods , Conserved Sequence , Databases, Genetic , Evolution, Molecular , Genomics/methods
8.
BMC Genomics ; 9: 501, 2008 Oct 24.
Article in English | MEDLINE | ID: mdl-18950477

ABSTRACT

BACKGROUND: Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq) and EBI (Genome Reviews) to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation. RESULTS: Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome.CorBank is freely accessible at http://www.corbank.u-psud.fr. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon. CONCLUSION: CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.


Subject(s)
Computational Biology/methods , Database Management Systems , Databases, Nucleic Acid , Databases, Protein , Genomics/methods , Sequence Alignment
9.
Biol Direct ; 3: 29, 2008 Jul 09.
Article in English | MEDLINE | ID: mdl-18613974

ABSTRACT

BACKGROUND: Since the reclassification of all life forms in three Domains (Archaea, Bacteria, Eukarya), the identity of their alleged forerunner (Last Universal Common Ancestor or LUCA) has been the subject of extensive controversies: progenote or already complex organism, prokaryote or protoeukaryote, thermophile or mesophile, product of a protracted progression from simple replicators to complex cells or born in the cradle of "catalytically closed" entities? We present a critical survey of the topic and suggest a scenario. RESULTS: LUCA does not appear to have been a simple, primitive, hyperthermophilic prokaryote but rather a complex community of protoeukaryotes with a RNA genome, adapted to a broad range of moderate temperatures, genetically redundant, morphologically and metabolically diverse. LUCA's genetic redundancy predicts loss of paralogous gene copies in divergent lineages to be a significant source of phylogenetic anomalies, i.e. instances where a protein tree departs from the SSU-rRNA genealogy; consequently, horizontal gene transfer may not have the rampant character assumed by many. Examining membrane lipids suggest LUCA had sn1,2 ester fatty acid lipids from which Archaea emerged from the outset as thermophilic by "thermoreduction," with a new type of membrane, composed of sn2,3 ether isoprenoid lipids; this occurred without major enzymatic reconversion. Bacteria emerged by reductive evolution from LUCA and some lineages further acquired extreme thermophily by convergent evolution. This scenario is compatible with the hypothesis that the RNA to DNA transition resulted from different viral invasions as proposed by Forterre. Beyond the controversy opposing "replication first" to metabolism first", the predictive arguments of theories on "catalytic closure" or "compositional heredity" heavily weigh in favour of LUCA's ancestors having emerged as complex, self-replicating entities from which a genetic code arose under natural selection. CONCLUSION: Life was born complex and the LUCA displayed that heritage. It had the "body "of a mesophilic eukaryote well before maturing by endosymbiosis into an organism adapted to an atmosphere rich in oxygen. Abundant indications suggest reductive evolution of this complex and heterogeneous entity towards the "prokaryotic" Domains Archaea and Bacteria. The word "prokaryote" should be abandoned because epistemologically unsound. REVIEWERS: This article was reviewed by Anthony Poole, Patrick Forterre, and Nicolas Galtier.


Subject(s)
Archaea/physiology , Bacteria/genetics , Biological Evolution , Eukaryotic Cells/physiology , Phylogeny , Prokaryotic Cells/physiology , Archaea/cytology , Archaea/genetics , Archaea/metabolism , Bacteria/cytology , Bacteria/metabolism , Eukaryotic Cells/cytology , Eukaryotic Cells/metabolism , Prokaryotic Cells/cytology , Prokaryotic Cells/metabolism
10.
Bioinformatics ; 24(13): i322-9, 2008 Jul 01.
Article in English | MEDLINE | ID: mdl-18586731

ABSTRACT

MOTIVATION: We have to cope with both a deluge of new genome sequences and a huge amount of data produced by high-throughput approaches used to exploit these genomic features. Crossing and comparing such heterogeneous and disparate data will help improving functional annotation of genomes. This requires designing elaborate integration systems such as warehouses for storing and querying these data. RESULTS: We have designed a relational genomic warehouse with an original multi-layer architecture made of a databases layer and an entities layer. We describe a new querying module, GenoQuery, which is based on this architecture. We use the entities layer to define mixed queries. These mixed queries allow searching for instances of biological entities and their properties in the different databases, without specifying in which database they should be found. Accordingly, we further introduce the central notion of alternative queries. Such queries have the same meaning as the original mixed queries, while exploiting complementarities yielded by the various integrated databases of the warehouse. We explain how GenoQuery computes all the alternative queries of a given mixed query. We illustrate how useful this querying module is by means of a thorough example. AVAILABILITY: http://www.lri.fr/~lemoine/GenoQuery/.


Subject(s)
Chromosome Mapping/methods , Database Management Systems , Databases, Genetic , Documentation/methods , Information Storage and Retrieval/methods , Internet , User-Computer Interface
11.
Biochimie ; 90(4): 595-608, 2008 Apr.
Article in English | MEDLINE | ID: mdl-17961904

ABSTRACT

The incredible development of comparative genomics during the last decade has required a correct use of the concept of homology that was previously utilized only by evolutionary biologists. Unhappily, this concept has been often misunderstood and thus misused when exploited outside its evolutionary context. This review brings back to the correct definition of homology and explains how this definition has been progressively refined in order to adapt it to the various new kinds of analysis of gene properties and of their products that appear with the progress of comparative genomics. Then, we illustrate the power and the proficiency of such a concept when using the available genomics data in order to study the evolution of individual genes, of entire genomes and of species, respectively. After explaining how we detect homologues by an exhaustive comparison of a hundred of complete proteomes, we describe three main lines of research we have developed in the recent years. The first one exploits synteny and gene context data to better understand the mechanisms of genome evolution in prokaryotes. The second one is based on phylogenomics approaches to reconstruct the tree of life. The last one is devoted to reminding that protein homology is often limited to structural segments (SOH=segment of homology or module). Detecting and numbering modules allows tracing back protein history by identifying the events of gene duplication and gene fusion. We insist that one of the main present difficulties in such studies is a lack of a reliable method to identify genuine orthologues. Finally, we show how these homology studies are helpful to annotate genes and genomes and to study the complexity of the relationships between sequence and function of a gene.


Subject(s)
Evolution, Molecular , Genes/genetics , Genome , Genomics , Animals , Bacteria/classification , Bacteria/genetics , Phylogeny , Proteome/analysis , Sequence Homology, Nucleic Acid
12.
BMC Evol Biol ; 7: 237, 2007 Nov 29.
Article in English | MEDLINE | ID: mdl-18047665

ABSTRACT

BACKGROUND: Comparison of completely sequenced microbial genomes has revealed how fluid these genomes are. Detecting synteny blocks requires reliable methods to determining the orthologs among the whole set of homologs detected by exhaustive comparisons between each pair of completely sequenced genomes. This is a complex and difficult problem in the field of comparative genomics but will help to better understand the way prokaryotic genomes are evolving. RESULTS: We have developed a suite of programs that automate three essential steps to study conservation of gene order, and validated them with a set of 107 bacteria and archaea that cover the majority of the prokaryotic taxonomic space. We identified the whole set of shared homologs between two or more species and computed the evolutionary distance separating each pair of homologs. We applied two strategies to extract from the set of homologs a collection of valid orthologs shared by at least two genomes. The first computes the Reciprocal Smallest Distance (RSD) using the PAM distances separating pairs of homologs. The second method groups homologs in families and reconstructs each family's evolutionary tree, distinguishing bona fide orthologs as well as paralogs created after the last speciation event. Although the phylogenetic tree method often succeeds where RSD fails, the reverse could occasionally be true. Accordingly, we used the data obtained with either methods or their intersection to number the orthologs that are adjacent in for each pair of genomes, the Positional Orthologous Genes (POGs), and to further study their properties. Once all these synteny blocks have been detected, we showed that POGs are subject to more evolutionary constraints than orthologs outside synteny groups, whichever the taxonomic distance separating the compared organisms. CONCLUSION: The suite of programs described in this paper allows a reliable detection of orthologs and is useful for evaluating gene order conservation in prokaryotes whichever their taxonomic distance. Thus, our approach will make easy the rapid identification of POGS in the next few years as we are expecting to be inundated with thousands of completely sequenced microbial genomes.


Subject(s)
Archaea/genetics , Bacteria/genetics , Evolution, Molecular , Genes, Archaeal , Genes, Bacterial , Synteny , Algorithms , Cluster Analysis , Phylogeny , Proteome , Species Specificity
13.
Microbiol Mol Biol Rev ; 71(1): 36-47, 2007 Mar.
Article in English | MEDLINE | ID: mdl-17347518

ABSTRACT

Major aspects of the pathway of de novo arginine biosynthesis via acetylated intermediates in microorganisms must be revised in light of recent enzymatic and genomic investigations. The enzyme N-acetylglutamate synthase (NAGS), which used to be considered responsible for the first committed step of the pathway, is present in a limited number of bacterial phyla only and is absent from Archaea. In many Bacteria, shorter proteins related to the Gcn5-related N-acetyltransferase family appear to acetylate l-glutamate; some are clearly similar to the C-terminal, acetyl-coenzyme A (CoA) binding domain of classical NAGS, while others are more distantly related. Short NAGSs can be single gene products, as in Mycobacterium spp. and Thermus spp., or fused to the enzyme catalyzing the last step of the pathway (argininosuccinase), as in members of the Alteromonas-Vibrio group. How these proteins bind glutamate remains to be determined. In some Bacteria, a bifunctional ornithine acetyltransferase (i.e., using both acetylornithine and acetyl-CoA as donors of the acetyl group) accounts for glutamate acetylation. In many Archaea, the enzyme responsible for glutamate acetylation remains elusive, but possible connections with a novel lysine biosynthetic pathway arose recently from genomic investigations. In some Proteobacteria (notably Xanthomonadaceae) and Bacteroidetes, the carbamoylation step of the pathway appears to involve N-acetylornithine or N-succinylornithine rather than ornithine. The product N-acetylcitrulline is deacetylated by an enzyme that is also involved in the provision of ornithine from acetylornithine; this is an important metabolic function, as ornithine itself can become essential as a source of other metabolites. This review insists on the biochemical and evolutionary implications of these findings.


Subject(s)
Arginine/biosynthesis , Bacteria/metabolism , Acetylation , Acetyltransferases/genetics , Acetyltransferases/metabolism , Arginine/metabolism , Bacteria/classification , Bacteria/genetics , Biosynthetic Pathways , Evolution, Molecular , Glutamates/metabolism , Ornithine/metabolism , Phylogeny
14.
BMC Bioinformatics ; 7: 436, 2006 Oct 06.
Article in English | MEDLINE | ID: mdl-17026747

ABSTRACT

BACKGROUND: Despite the current availability of several hundreds of thousands of amino acid sequences, more than 36% of the enzyme activities (EC numbers) defined by the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB) are not associated with any amino acid sequence in major public databases. This wide gap separating knowledge of biochemical function and sequence information is found for nearly all classes of enzymes. Thus, there is an urgent need to explore these sequence-less EC numbers, in order to progressively close this gap. DESCRIPTION: We designed ORENZA, a PostgreSQL database of ORphan ENZyme Activities, to collate information about the EC numbers defined by the NC-IUBMB with specific emphasis on orphan enzyme activities. Complete lists of all EC numbers and of orphan EC numbers are available and will be periodically updated. ORENZA allows one to browse the complete list of EC numbers or the subset associated with orphan enzymes or to query a specific EC number, an enzyme name or a species name for those interested in particular organisms. It is possible to search ORENZA for the different biochemical properties of the defined enzymes, the metabolic pathways in which they participate, the taxonomic data of the organisms whose genomes encode them, and many other features. The association of an enzyme activity with an amino acid sequence is clearly underlined, making it easy to identify at once the orphan enzyme activities. Interactive publishing of suggestions by the community would provide expert evidence for re-annotation of orphan EC numbers in public databases. CONCLUSION: ORENZA is a Web resource designed to progressively bridge the unwanted gap between function (enzyme activities) and sequence (dataset present in public databases). ORENZA should increase interactions between communities of biochemists and of genomicists. This is expected to reduce the number of orphan enzyme activities by allocating gene sequences to the relevant enzymes.


Subject(s)
Databases, Genetic , Enzymes/genetics , Internet , Amino Acid Sequence/genetics , Internet/statistics & numerical data , Sequence Analysis, Protein/methods
16.
Drug Discov Today ; 11(7-8): 300-5, 2006 Apr.
Article in English | MEDLINE | ID: mdl-16580971

ABSTRACT

Despite the immense progress of genomics, and the current availability of several hundreds of thousands of amino acid sequences, >39% of well-defined enzyme activities (as represented by enzyme commission, EC, numbers) are not associated with any sequence. There is an urgent need to explore the 1525 orphan enzymes (enzymes having EC numbers without an associated sequence) to bridge the wide gap that separates knowledge of biochemical function and sequence information. Strikingly, orphan enzymes can even be found among enzymatic activities successfully used as drug targets. Here, knowledge of sequence would help to develop molecular-targeted therapies, suppressing many drug-related side-effects.


Subject(s)
Drug Design , Enzyme Inhibitors/pharmacology , Enzymes/chemistry , Animals , Enzyme Inhibitors/chemistry , Enzymes/genetics , Genomics , Humans , Sequence Analysis, Protein
17.
BMC Genomics ; 7: 4, 2006 Jan 12.
Article in English | MEDLINE | ID: mdl-16409639

ABSTRACT

BACKGROUND: The N-acetylation of L-glutamate is regarded as a universal metabolic strategy to commit glutamate towards arginine biosynthesis. Until recently, this reaction was thought to be catalyzed by either of two enzymes: (i) the classical N-acetylglutamate synthase (NAGS, gene argA) first characterized in Escherichia coli and Pseudomonas aeruginosa several decades ago and also present in vertebrates, or (ii) the bifunctional version of ornithine acetyltransferase (OAT, gene argJ) present in Bacteria, Archaea and many Eukaryotes. This paper focuses on a new and surprising aspect of glutamate acetylation. We recently showed that in Moritella abyssi and M. profunda, two marine gamma proteobacteria, the gene for the last enzyme in arginine biosynthesis (argH) is fused to a short sequence that corresponds to the C-terminal, N-acetyltransferase-encoding domain of NAGS and is able to complement an argA mutant of E. coli. Very recently, other authors identified in Mycobacterium tuberculosis an independent gene corresponding to this short C-terminal domain and coding for a new type of NAGS. We have investigated the two prokaryotic Domains for patterns of gene-enzyme relationships in the first committed step of arginine biosynthesis. RESULTS: The argH-A fusion, designated argH(A), and discovered in Moritella was found to be present in (and confined to) marine gamma proteobacteria of the Alteromonas- and Vibrio-like group. Most of them have a classical NAGS with the exception of Idiomarina loihiensis and Pseudoalteromonas haloplanktis which nevertheless can grow in the absence of arginine and therefore appear to rely on the arg(A) sequence for arginine biosynthesis. Screening prokaryotic genomes for virtual argH-X 'fusions' where X stands for a homologue of arg(A), we retrieved a large number of Bacteria and several Archaea, all of them devoid of a classical NAGS. In the case of Thermus thermophilus and Deinococcus radiodurans, the arg(A)-like sequence clusters with argH in an operon-like fashion. In this group of sequences, we find the short novel NAGS of the type identified in M. tuberculosis. Among these organisms, at least Thermus, Mycobacterium and Streptomyces species appear to rely on this short NAGS version for arginine biosynthesis. CONCLUSION: The gene-enzyme relationship for the first committed step of arginine biosynthesis should now be considered in a new perspective. In addition to bifunctional OAT, nature appears to implement at least three alternatives for the acetylation of glutamate. It is possible to propose evolutionary relationships between them starting from the same ancestral N-acetyltransferase domain. In M. tuberculosis and many other bacteria, this domain evolved as an independent enzyme, whereas it fused either with a carbamate kinase fold to give the classical NAGS (as in E. coli) or with argH as in marine gamma proteobacteria. Moreover, there is an urgent need to clarify the current nomenclature since the same gene name argA has been used to designate structurally different entities. Clarifying the confusion would help to prevent erroneous genomic annotation.


Subject(s)
Arginine/biosynthesis , Gammaproteobacteria/genetics , Gammaproteobacteria/metabolism , Acetylation , Acetyltransferases/genetics , Acetyltransferases/metabolism , Amino Acid Sequence , Argininosuccinate Lyase/genetics , Argininosuccinate Lyase/metabolism , Computational Biology , Conserved Sequence , Evolution, Molecular , Gammaproteobacteria/classification , Gammaproteobacteria/enzymology , Gene Fusion , Genes, Bacterial , Genomics , Marine Biology , Models, Biological , Molecular Sequence Data , Moritella/enzymology , Moritella/genetics , Phylogeny , Prokaryotic Cells , Sequence Homology, Amino Acid
18.
Science ; 307(5706): 42, 2005 Jan 07.
Article in English | MEDLINE | ID: mdl-15637255
20.
BMC Genomics ; 5(1): 52, 2004 Aug 02.
Article in English | MEDLINE | ID: mdl-15287962

ABSTRACT

BACKGROUND: Annotating genomes remains an hazardous task. Mistakes or gaps in such a complex process may occur when relevant knowledge is ignored, whether lost, forgotten or overlooked. This paper exemplifies an approach which could help to resuscitate such meaningful data. RESULTS: We show that a set of closely related sequences which have been annotated as ornithine carbamoyltransferases are actually putrescine carbamoyltransferases. This demonstration is based on the following points : (i) use of enzymatic data which had been overlooked, (ii) rediscovery of a short NH2-terminal sequence allowing to reannotate a wrongly annotated ornithine carbamoyltransferase as a putrescine carbamoyltransferase, (iii) identification of conserved motifs allowing to distinguish unambiguously between the two kinds of carbamoyltransferases, and (iv) comparative study of the gene context of these different sequences. CONCLUSIONS: We explain why this specific case of misannotation had not yet been described and draw attention to the fact that analogous instances must be rather frequent. We urge to be especially cautious when high sequence similarity is coupled with an apparent lack of biochemical information. Moreover, from the point of view of genome annotation, proteins which have been studied experimentally but are not correlated with sequence data in current databases qualify as "orphans", just as unassigned genomic open reading frames do. The strategy we used in this paper to bridge such gaps in knowledge could work whenever it is possible to collect a body of facts about experimental data, homology, unnoticed sequence data, and accurate informations about gene context.


Subject(s)
Carboxyl and Carbamoyl Transferases/classification , Databases, Protein , Ornithine Carbamoyltransferase/classification , Sequence Homology, Amino Acid , Amino Acid Motifs , Amino Acid Sequence , Bacterial Proteins/chemistry , Bacterial Proteins/classification , Bacterial Proteins/genetics , Carboxyl and Carbamoyl Transferases/chemistry , Carboxyl and Carbamoyl Transferases/genetics , Enterococcus faecalis/enzymology , Enterococcus faecalis/genetics , Evolution, Molecular , Gene Order , Genes, Bacterial , Lactobacillus/enzymology , Lactobacillus/genetics , Listeria monocytogenes/enzymology , Listeria monocytogenes/genetics , Molecular Sequence Data , Multigene Family , Mycoplasma mycoides/enzymology , Mycoplasma mycoides/genetics , Ornithine Carbamoyltransferase/chemistry , Ornithine Carbamoyltransferase/genetics , Pediococcus/enzymology , Pediococcus/genetics , Phylogeny , Species Specificity , Streptococcus mutans/enzymology , Streptococcus mutans/genetics , Structure-Activity Relationship
SELECTION OF CITATIONS
SEARCH DETAIL
...