Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 21
Filter
Add more filters










Publication year range
1.
Nucleic Acids Res ; 52(D1): D67-D71, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37971299

ABSTRACT

The Bioinformation and DNA Data Bank of Japan (DDBJ) Center (https://www.ddbj.nig.ac.jp) provides database archives that cover a wide range of fields in life sciences. As a founding member of the International Nucleotide Sequence Database Collaboration (INSDC), DDBJ accepts and distributes nucleotide sequence data as well as their study and sample information along with the National Center for Biotechnology Information in the United States and the European Bioinformatics Institute (EBI). Besides INSDC databases, the DDBJ Center provides databases for functional genomics (GEA: Genomic Expression Archive), metabolomics (MetaboBank) and human genetic and phenotypic data (JGA: Japanese Genotype-phenotype Archive). These database systems have been built on the National Institute of Genetics (NIG) supercomputer, which is also open for domestic life science researchers to analyze large-scale sequence data. This paper reports recent updates on the archival databases and the services of the DDBJ Center, highlighting the newly redesigned MetaboBank. MetaboBank uses BioProject and BioSample in its metadata description making it suitable for multi-omics large studies. Its collaboration with MetaboLights at EBI brings synergy in locating and reusing public data.


Subject(s)
Databases, Nucleic Acid , Metabolomics , Metadata , Humans , Computational Biology , Genomics , Internet , Japan , Multiomics/methods
2.
F1000Res ; 9: 136, 2020.
Article in English | MEDLINE | ID: mdl-32308977

ABSTRACT

We report on the activities of the 2015 edition of the BioHackathon, an annual event that brings together researchers and developers from around the world to develop tools and technologies that promote the reusability of biological data. We discuss issues surrounding the representation, publication, integration, mining and reuse of biological data and metadata across a wide range of biomedical data types of relevance for the life sciences, including chemistry, genotypes and phenotypes, orthology and phylogeny, proteomics, genomics, glycomics, and metabolomics. We describe our progress to address ongoing challenges to the reusability and reproducibility of research results, and identify outstanding issues that continue to impede the progress of bioinformatics research. We share our perspective on the state of the art, continued challenges, and goals for future research and development for the life sciences Semantic Web.


Subject(s)
Biological Science Disciplines , Computational Biology , Semantic Web , Data Mining , Metadata , Reproducibility of Results
3.
Sci Rep ; 7: 43368, 2017 03 06.
Article in English | MEDLINE | ID: mdl-28262809

ABSTRACT

Although host-plant selection is a central topic in ecology, its general underpinnings are poorly understood. Here, we performed a case study focusing on the publicly available data on Japanese butterflies. A combined statistical analysis of plant-herbivore relationships and taxonomy revealed that some butterfly subfamilies in different families feed on the same plant families, and the occurrence of this phenomenon more than just by chance, thus indicating the independent acquisition of adaptive phenotypes to the same hosts. We consequently integrated plant-herbivore and plant-compound relationship data and conducted a statistical analysis to identify compounds unique to host plants of specific butterfly families. Some of the identified plant compounds are known to attract certain butterfly groups while repelling others. The additional incorporation of insect-compound relationship data revealed potential metabolic processes that are related to host plant selection. Our results demonstrate that data integration enables the computational detection of compounds putatively involved in particular interspecies interactions and that further data enrichment and integration of genomic and transcriptomic data facilitates the unveiling of the molecular mechanisms involved in host plant selection.


Subject(s)
Butterflies/physiology , Computational Biology/methods , Feeding Behavior , Plants/parasitology , Animals , Chemotactic Factors/analysis , Insect Repellents/analysis , Phytochemicals/analysis , Plants/chemistry
4.
J Chem Inf Model ; 56(3): 510-6, 2016 Mar 28.
Article in English | MEDLINE | ID: mdl-26822930

ABSTRACT

Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome. In this study, we developed a novel method to identify a candidate enzyme gene of a reaction using the chemical structures of the substrate-product pair (reactant pair). The proposed method is based on a search for similar reactant pairs in a reference database and offers ortholog groups that possibly mediate the given reaction. We applied the proposed method to two experimentally validated reactions. As a result, we confirmed that the histidine transaminase was correctly identified. Although our method could not directly identify the asparagine oxo-acid transaminase, we successfully found the paralog gene most similar to the correct enzyme gene. We also applied our method to infer candidate enzyme genes in the mesaconate pathway. The advantage of our method lies in the prediction of possible genes for orphan enzyme reactions where any associated gene sequences are not determined yet. We believe that this approach will facilitate experimental identification of genes for orphan enzymes.


Subject(s)
Enzymes/genetics , Databases, Protein , Enzymes/metabolism , Substrate Specificity
5.
Plant Physiol ; 168(1): 47-59, 2015 May.
Article in English | MEDLINE | ID: mdl-25761715

ABSTRACT

Grape (Vitis vinifera) accumulates various polyphenolic compounds, which protect against environmental stresses, including ultraviolet-C (UV-C) light and pathogens. In this study, we looked at the transcriptome and metabolome in grape berry skin after UV-C irradiation, which demonstrated the effectiveness of omics approaches to clarify important traits of grape. We performed transcriptome analysis using a genome-wide microarray, which revealed 238 genes up-regulated more than 5-fold by UV-C light. Enrichment analysis of Gene Ontology terms showed that genes encoding stilbene synthase, a key enzyme for resveratrol synthesis, were enriched in the up-regulated genes. We performed metabolome analysis using liquid chromatography-quadrupole time-of-flight mass spectrometry, and 2,012 metabolite peaks, including unidentified peaks, were detected. Principal component analysis using the peaks showed that only one metabolite peak, identified as resveratrol, was highly induced by UV-C light. We updated the metabolic pathway map of grape in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and in the KaPPA-View 4 KEGG system, then projected the transcriptome and metabolome data on a metabolic pathway map. The map showed specific induction of the resveratrol synthetic pathway by UV-C light. Our results showed that multiomics is a powerful tool to elucidate the accumulation mechanisms of secondary metabolites, and updated systems, such as KEGG and KaPPA-View 4 KEGG for grape, can support such studies.


Subject(s)
Biosynthetic Pathways , Fruit/genetics , Gene Expression Profiling , Metabolomics , Stilbenes/metabolism , Ultraviolet Rays , Vitis/genetics , Biosynthetic Pathways/radiation effects , Calibration , Darkness , Fluorescence , Fruit/metabolism , Fruit/radiation effects , Gene Ontology , Genes, Plant , Metabolome/genetics , Metabolome/radiation effects , Molecular Sequence Annotation , Principal Component Analysis , Secondary Metabolism/genetics , Secondary Metabolism/radiation effects , Vitis/metabolism , Vitis/radiation effects
6.
J Bioinform Comput Biol ; 12(6): 1442001, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25385078

ABSTRACT

Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.


Subject(s)
Biological Ontologies , Databases, Protein , Enzymes/chemistry , Enzymes/classification , Information Storage and Retrieval/methods , Terminology as Topic , Enzymes/genetics , Natural Language Processing
7.
Bioinformatics ; 30(12): i165-74, 2014 Jun 15.
Article in English | MEDLINE | ID: mdl-24931980

ABSTRACT

MOTIVATION: Metabolic pathway analysis is crucial not only in metabolic engineering but also in rational drug design. However, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale. RESULTS: In this article, we develop a novel method to predict the multistep reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as 'multistep reaction sequence likeness', i.e. whether a compound-compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm, we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multistep reaction sequences, based on chemical substructure fingerprints/descriptors of compounds. We further demonstrate the usefulness of our proposed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set and discuss characteristic features of the extracted chemical substructure transformation patterns in multistep reaction sequences. Our comprehensively predicted reaction networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways. AVAILABILITY AND IMPLEMENTATION: Materials are available for free at http://web.kuicr.kyoto-u.ac.jp/supp/kot/ismb2014/


Subject(s)
Metabolic Networks and Pathways , Metabolome , Metabolomics/methods , Algorithms , Support Vector Machine
8.
Bioinformatics ; 29(13): i135-44, 2013 Jul 01.
Article in English | MEDLINE | ID: mdl-23812977

ABSTRACT

MOTIVATION: The metabolic pathway is an important biochemical reaction network involving enzymatic reactions among chemical compounds. However, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. Therefore, the most important challenge in metabolomics is the automated de novo reconstruction of metabolic pathways, which includes the elucidation of previously unknown reactions to bridge the metabolic gaps. RESULTS: In this article, we develop a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound-compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as 'enzymatic-reaction likeness', i.e. whether compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns and in the large-scale applicability owing to the computational efficiency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG). Our comprehensively predicted reaction networks of 15 698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics. AVAILABILITY: Softwares are available on request. Supplementary material are available at http://web.kuicr.kyoto-u.ac.jp/supp/kot/ismb2013/.


Subject(s)
Metabolic Networks and Pathways , Metabolomics/methods , Algorithms , Enzymes/metabolism , Linear Models , Metabolome , Support Vector Machine
9.
J Chem Inf Model ; 53(3): 613-22, 2013 Mar 25.
Article in English | MEDLINE | ID: mdl-23384306

ABSTRACT

The metabolic network is both a network of chemical reactions and a network of enzymes that catalyze reactions. Toward better understanding of this duality in the evolution of the metabolic network, we developed a method to extract conserved sequences of reactions called reaction modules from the analysis of chemical compound structure transformation patterns in all known metabolic pathways stored in the KEGG PATHWAY database. The extracted reaction modules are repeatedly used as if they are building blocks of the metabolic network and contain chemical logic of organic reactions. Furthermore, the reaction modules often correspond to traditional pathway modules defined as sets of enzymes in the KEGG MODULE database and sometimes to operon-like gene clusters in prokaryotic genomes. We identified well-conserved, possibly ancient, reaction modules involving 2-oxocarboxylic acids. The chain extension module that appears as the tricarboxylic acid (TCA) reaction sequence in the TCA cycle is now shown to be used in other pathways together with different types of modification modules. We also identified reaction modules and their connection patterns for aromatic ring cleavages in microbial biodegradation pathways, which are most characteristic in terms of both distinct reaction sequences and distinct gene clusters. The modular architecture of biodegradation modules will have a potential for predicting degradation pathways of xenobiotic compounds. The collection of these and many other reaction modules is made available as part of the KEGG database.


Subject(s)
Conserved Sequence , Metabolic Networks and Pathways/genetics , Biotransformation , Citric Acid Cycle/genetics , Databases, Genetic , Enzymes/chemistry , Fatty Acids/chemical synthesis , Multigene Family , Oxidation-Reduction
10.
Nucleic Acids Res ; 41(Database issue): D353-7, 2013 Jan.
Article in English | MEDLINE | ID: mdl-23193276

ABSTRACT

The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and 129 archaea). The OCs were constructed by applying the quasi-clique-based clustering method to all possible protein coding genes in all complete genomes, based on their amino acid sequence similarities. It is computationally efficient to calculate OCs, which enables to regularly update the contents. KEGG OC has the following two features: (i) It consists of all complete genomes of a wide variety of organisms from three domains of life, and the number of organisms is the largest among the existing databases; and (ii) It is compatible with the KEGG database by sharing the same sets of genes and identifiers, which leads to seamless integration of OCs with useful components in KEGG such as biological pathways, pathway modules, functional hierarchy, diseases and drugs. The KEGG OC resources are accessible via OC Viewer that provides an interactive visualization of OCs at different taxonomic levels.


Subject(s)
Databases, Genetic , Genes, Archaeal , Genes, Bacterial , Genes , Algorithms , Classification/methods , Cluster Analysis , Eukaryota/genetics , Genome, Archaeal , Genome, Bacterial , Genomics/methods , Internet , Sequence Homology, Amino Acid
11.
BMC Syst Biol ; 7 Suppl 6: S2, 2013.
Article in English | MEDLINE | ID: mdl-24564846

ABSTRACT

BACKGROUND: In order to develop hypothesis on unknown metabolic pathways, biochemists frequently rely on literature that uses a free-text format to describe functional groups or substructures. In computational chemistry or cheminformatics, molecules are typically represented by chemical descriptors, i.e., vectors that summarize information on its various properties. However, it is difficult to interpret these chemical descriptors since they are not directly linked to the terminology of functional groups or substructures that the biochemists use. METHODS: In this study, we used KEGG Chemical Function (KCF) format to computationally describe biochemical substructures in seven attributes that resemble biochemists' way of dealing with substructures. RESULTS: We established KCF-S (KCF-and-Substructures) format as an additional structural information of KCF. Applying KCF-S revealed the specific appearance of substructures from various datasets of molecules that describes the characteristics of the respective datasets. Structure-based clustering of molecules using KCF-S resulted the clusters in which molecular weights and structures were less diverse than those obtained by conventional chemical fingerprints. We further applied KCF-S to find the pairs of molecules that are possibly converted to each other in enzymatic reactions, and KCF-S clearly improved predictive performance than that presented previously. CONCLUSIONS: KCF-S defines biochemical substructures with keeping interpretability, suggesting the potential to apply more studies on chemical bioinformatics. KCF and KCF-S can be automatically converted from Molfile format, enabling to deal with molecules from any data sources.


Subject(s)
Computational Biology/methods , Cluster Analysis , Databases, Chemical , Enzymes/metabolism , Metabolic Networks and Pathways , Reproducibility of Results , Structure-Activity Relationship
12.
Methods Mol Biol ; 802: 19-39, 2012.
Article in English | MEDLINE | ID: mdl-22130871

ABSTRACT

In this chapter, we demonstrate the usability of the KEGG (Kyoto encyclopedia of genes and genomes) databases and tools, especially focusing on the visualization of the omics data. The desktop application KegArray and many Web-based tools are tightly integrated with the KEGG knowledgebase, which helps visualize and interpret large amount of data derived from high-throughput measurement techniques including microarray, metagenome, and metabolome analyses. Recently developed resources for human disease, drug, and plant research are also mentioned.


Subject(s)
Databases, Genetic , Genomics , Software , Data Mining , Disease/genetics , Humans , Internet , Metabolic Networks and Pathways , Metabolome , Pharmaceutical Preparations/chemistry
13.
BMC Bioinformatics ; 12 Suppl 14: S1, 2011 Dec 14.
Article in English | MEDLINE | ID: mdl-22373367

ABSTRACT

BACKGROUND: In contrast to the increasing number of the successful genome projects, there still remain many orphan metabolites for which their synthesis processes are unknown. Metabolites, including these orphan metabolites, can be classified into groups that share the same core substructures, originated from the same biosynthetic pathways. It is known that many metabolites are synthesized by adding up building blocks to existing metabolites. Therefore, it is proposed that, for any given group of metabolites, finding the core substructure and the branched substructures can help predict their biosynthetic pathway. There already have been many reports on the multiple graph alignment techniques to find the conserved chemical substructures in relatively small molecules. However, they are optimized for ligand binding and are not suitable for metabolomic studies. RESULTS: We developed an efficient multiple graph alignment method named as MUCHA (Multiple Chemical Alignment), specialized for finding metabolic building blocks. This method showed the strength in finding metabolic building blocks with preserving the relative positions among the substructures, which is not achieved by simply applying the frequent graph mining techniques. Compared with the combined pairwise alignments, this proposed MUCHA method generally reduced computational costs with improving the quality of the alignment. CONCLUSIONS: MUCHA successfully find building blocks of secondary metabolites, and has a potential to complement to other existing methods to reconstruct metabolic networks using reaction patterns.


Subject(s)
Chemistry/methods , Metabolic Networks and Pathways , Algorithms
14.
Nucleic Acids Res ; 39(Database issue): D677-84, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21097783

ABSTRACT

Correlations of gene-to-gene co-expression and metabolite-to-metabolite co-accumulation calculated from large amounts of transcriptome and metabolome data are useful for uncovering unknown functions of genes, functional diversities of gene family members and regulatory mechanisms of metabolic pathway flows. Many databases and tools are available to interpret quantitative transcriptome and metabolome data, but there are only limited ones that connect correlation data to biological knowledge and can be utilized to find biological significance of it. We report here a new metabolic pathway database, KaPPA-View4 (http://kpv.kazusa.or.jp/kpv4/), which is able to overlay gene-to-gene and/or metabolite-to-metabolite relationships as curves on a metabolic pathway map, or on a combination of up to four maps. This representation would help to discover, for example, novel functions of a transcription factor that regulates genes on a metabolic pathway. Pathway maps of the Kyoto Encyclopedia of Genes and Genomes (KEGG) and maps generated from their gene classifications are available at KaPPA-View4 KEGG version (http://kpv.kazusa.or.jp/kpv4-kegg/). At present, gene co-expression data from the databases ATTED-II, COXPRESdb, CoP and MiBASE for human, mouse, rat, Arabidopsis, rice, tomato and other plants are available.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Gene Regulatory Networks , Metabolic Networks and Pathways/genetics , Metabolome/genetics , Animals , Humans , Internet , Mice , Rats
15.
Nucleic Acids Res ; 38(Web Server issue): W138-43, 2010 Jul.
Article in English | MEDLINE | ID: mdl-20435670

ABSTRACT

The KEGG RPAIR database is a collection of biochemical structure transformation patterns, called RDM patterns, and chemical structure alignments of substrate-product pairs (reactant pairs) in all known enzyme-catalyzed reactions taken from the Enzyme Nomenclature and the KEGG PATHWAY database. Here, we present PathPred (http://www.genome.jp/tools/pathpred/), a web-based server to predict plausible pathways of muti-step reactions starting from a query compound, based on the local RDM pattern match and the global chemical structure alignment against the reactant pair library. In this server, we focus on predicting pathways for microbial biodegradation of environmental compounds and biosynthesis of plant secondary metabolites, which correspond to characteristic RDM patterns in 947 and 1397 reactant pairs, respectively. The server provides transformed compounds and reference transformation patterns in each predicted reaction, and displays all predicted multi-step reaction pathways in a tree-shaped graph.


Subject(s)
Enzymes/metabolism , Metabolic Networks and Pathways , Software , Biocatalysis , Biosynthetic Pathways , Environmental Pollutants/metabolism , Internet
16.
Genome Inform ; 24: 104-15, 2010.
Article in English | MEDLINE | ID: mdl-22081593

ABSTRACT

Many cofactors and nucleotides containing sulfur atoms are known to have important functions in a variety of organisms. Recently, the biosynthetic pathways of these sulfur containing compounds have been revealed, where many enzymes relay sulfur atoms. Increasing evidence also suggests that the prokaryotic sulfur-relay enzymes might be the evolutionary origin of ubiquitination and the related systems that control a wide range of physiological processes in eukaryotic cells. However, these sulfur-relay enzymes have been studied in only a small number of organisms. Here we carried out comparative genomic analysis and examined the presence and absence of sulfurtransferases utilized in the biosynthetic pathways of molybdenum cofactor (Moco), 2-thiouridine (S(2)U), and 4-thiouridine (S(4)U), and IscS, a cysteine desulfurase. We found that all eukaryotes and many other organisms lack the intermediate enzymes in S(2)U biosynthesis. It is also found that most genes lack rhodanese homology domain (RHD), a catalytic domain of sulfurtransferase. Some organisms have a conserved sequence composed of about 100 residues in the C terminus of TusA, different from RHD. Host-associated organisms have a tendency to lose Moco biosynthetic enzymes, and some organisms have MoaD-MoaE fusion protein. Our findings suggest that sulfur-relay pathways have been so diversified that some putative sulfurtransferases possibly function in other unknown pathways.


Subject(s)
Gene Expression Regulation , Sulfur/metabolism , Sulfurtransferases/metabolism , Algorithms , Animals , Bacterial Proteins/metabolism , Cluster Analysis , Computational Biology/methods , Escherichia coli/genetics , Fungal Proteins/metabolism , Gene Expression Profiling , Genomics , Humans , Protein Structure, Tertiary , Sequence Alignment , Software , Ubiquitin/metabolism
17.
Genome Inform ; 24: 127-38, 2010.
Article in English | MEDLINE | ID: mdl-22081595

ABSTRACT

UGTs (UDP glycosyltransferase) are the largest glycosyltransferase gene family in higher plants, modifying secondary metabolites, hormones, and xenobiotics. This gene family plays an important role in the vast diversity of plant secondary metabolites specific to species. Experimental data of biochemical activities and physiological roles of plant UGTs are increasing but most UGTs are not still functionally characterized. To understand their catalytic specificity and function from sequence data, phylogenetic analyses have been achieved mainly in Arabidopsis, but massive and comprehensive approach covering various species has not been applied yet. In this study, we collected 733 UGT sequences derived from 96 plant species and 252 substrate specificity data. We constructed a phylogenetic tree and divided most part of these genes into nine sequence groups, which are characterized by biochemical specificity. Furthermore, we performed genome-wide analysis of seven plant species UGTs by mapping them into these groups. We propose this is the first step to understand whole glycosylated secondary metabolites of each plant species from its genome information.


Subject(s)
Computational Biology/methods , Glucuronosyltransferase/genetics , Plant Proteins/genetics , Algorithms , Arabidopsis/enzymology , Arabidopsis/genetics , Catalysis , Genes, Plant , Glycosylation , Multigene Family , Phylogeny , Plants/genetics , Protein Binding , Software , Substrate Specificity
18.
Carbohydr Res ; 344(7): 881-7, 2009 May 12.
Article in English | MEDLINE | ID: mdl-19327755

ABSTRACT

Glycosyltransferases comprise highly divergent groups of enzymes, which play a central role in the synthesis of complex glycans. Because the repertoire of glycosyltransferases in the genome determines the range of synthesizable glycans, and because the increasing amount of genome sequence data is now available, it is essential to examine these enzymes across organisms to explore possible structures and functions of the glycoconjugates. In this study, we systematically investigated 36 eukaryotic genomes and obtained 3426 glycosyltransferase homologs for biosynthesis of major glycans, classified into 53 families based on sequence similarity. The families were further grouped into six functional categories based on the biosynthetic pathways, which revealed characteristic patterns among organism groups in the degree of conservation and in the number of paralogs. The results also revealed a strong correlation between the number of glycosyltransferases and the number of coding genes in each genome. We then predicted the ability to synthesize major glycan structures including N-glycan precursors and GPI-anchors in each organism from the combination of the glycosyltransferase families. This indicates that not only parasitic protists but also some algae are likely to synthesize smaller structures than the structures known to be conserved among a wide range of eukaryotes. Finally we discuss the functions of two large families, sialyltransferases and beta 4-glycosyltransferases, by performing finer classifications into subfamilies. Our findings suggest that universality and diversity of glycans originate from two types of evolution of glycosyltransferase families, namely conserved families with few paralogs and diverged families with many paralogs.


Subject(s)
Eukaryotic Cells/enzymology , Eukaryotic Cells/metabolism , Genome/genetics , Glycosyltransferases/classification , Glycosyltransferases/genetics , Polysaccharides/biosynthesis , Polysaccharides/chemistry , Animals , Glycosyltransferases/metabolism , Humans , Models, Molecular , Sialyltransferases/classification , Sialyltransferases/genetics , Sialyltransferases/metabolism
19.
Nucleic Acids Res ; 36(Database issue): D480-4, 2008 Jan.
Article in English | MEDLINE | ID: mdl-18077471

ABSTRACT

KEGG (http://www.genome.jp/kegg/) is a database of biological systems that integrates genomic, chemical and systemic functional information. KEGG provides a reference knowledge base for linking genomes to life through the process of PATHWAY mapping, which is to map, for example, a genomic or transcriptomic content of genes to KEGG reference pathways to infer systemic behaviors of the cell or the organism. In addition, KEGG provides a reference knowledge base for linking genomes to the environment, such as for the analysis of drug-target relationships, through the process of BRITE mapping. KEGG BRITE is an ontology database representing functional hierarchies of various biological objects, including molecules, cells, organisms, diseases and drugs, as well as relationships among them. KEGG PATHWAY is now supplemented with a new global map of metabolic pathways, which is essentially a combined map of about 120 existing pathway maps. In addition, smaller pathway modules are defined and stored in KEGG MODULE that also contains other functional units and complexes. The KEGG resource is being expanded to suit the needs for practical applications. KEGG DRUG contains all approved drugs in the US and Japan, and KEGG DISEASE is a new database linking disease genes, pathways, drugs and diagnostic markers.


Subject(s)
Databases, Factual , Genomics , Systems Biology , Disease , Humans , Internet , Metabolic Networks and Pathways , Molecular Structure , Pharmaceutical Preparations/chemistry , Systems Integration , User-Computer Interface
20.
Genome Inform ; 19: 3-14, 2007.
Article in English | MEDLINE | ID: mdl-18546500

ABSTRACT

Almost half of biological molecules (proteins and metabolites) are extrapolated as glycosylated within cells. Detection of glycosylation patterns and of attached sugar types is therefore an important step in future glycomics research. We present two algorithms to detect sugar types in Haworth projection, i.e., from x-y coordinates. The algorithms were applied to the database of flavonoid and identified backbone-specific biases of sugar types and their conjugated positions. The algorithms contribute not only to bridge between polysaccharide databases and pathway databases, but also to detect structural errors in metabolic databases.


Subject(s)
Computational Biology/methods , Monosaccharides/chemistry , Algorithms , Carbohydrates/chemistry , Glycosylation , Models, Chemical , Molecular Conformation , Plants , Polysaccharides/chemistry , Programming Languages , Stereoisomerism
SELECTION OF CITATIONS
SEARCH DETAIL
...