Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 157
Filter
1.
Protein Sci ; 32(12): e4820, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37881892

ABSTRACT

The KEGG database and analysis tools (https://www.kegg.jp) have been developed mostly for understanding genes and genomes of cellular organisms. The KO (KEGG Orthology) dataset, which is a collection of functional orthologs, plays the role of linking genes in the genome to pathways and other molecular networks, enabling KEGG mapping to uncover hidden features in the genome. Although viruses were part of KEGG for some time, they were not fully integrated in the KEGG analysis tools, because the KO assignment rate is very low for virus genes. To supplement KOs a new dataset named virus ortholog clusters (VOCs) is computationally generated, covering 90% of viral proteins in KEGG. VOCs can be used, in place of KOs, for taxonomy mapping to uncover relationships of sequence similarity groups and taxonomic groups and for identifying conserved gene orders in virus genomes. Furthermore, selected VOCs are used to define tentative KOs for characterizing protein functions. Here an overview of KEGG tools is presented focusing on these extensions for viral protein analysis.


Subject(s)
Viral Proteins , Viruses , Viral Proteins/genetics , Genome , Databases, Factual , Viruses/genetics
2.
Nucleic Acids Res ; 51(D1): D587-D592, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36300620

ABSTRACT

KEGG (https://www.kegg.jp) is a manually curated database resource integrating various biological objects categorized into systems, genomic, chemical and health information. Each object (database entry) is identified by the KEGG identifier (kid), which generally takes the form of a prefix followed by a five-digit number, and can be retrieved by appending /entry/kid in the URL. The KEGG pathway map viewer, the Brite hierarchy viewer and the newly released KEGG genome browser can be launched by appending /pathway/kid, /brite/kid and /genome/kid, respectively, in the URL. Together with an improved annotation procedure for KO (KEGG Orthology) assignment, an increasing number of eukaryotic genomes have been included in KEGG for better representation of organisms in the taxonomic tree. Multiple taxonomy files are generated for classification of KEGG organisms and viruses, and the Brite hierarchy viewer is used for taxonomy mapping, a variant of Brite mapping in the new KEGG Mapper suite. The taxonomy mapping enables analysis of, for example, how functional links of genes in the pathway and physical links of genes on the chromosome are conserved among organism groups.


Subject(s)
Genome , Genomics , Genomics/methods , Databases, Factual , Databases, Genetic
3.
Protein Sci ; 31(1): 47-53, 2022 01.
Article in English | MEDLINE | ID: mdl-34423492

ABSTRACT

In contrast to artificial intelligence and machine learning approaches, KEGG (https://www.kegg.jp) has relied on human intelligence to develop "models" of biological systems, especially in the form of KEGG pathway maps that are manually created by capturing knowledge from published literature. The KEGG models can then be used in biological big data analysis, for example, for uncovering systemic functions of an organism hidden in its genome sequence through the simple procedure of KEGG mapping. Here we present an updated version of KEGG Mapper, a suite of KEGG mapping tools reported previously (Kanehisa and Sato, Protein Sci 2020; 29:28-35), together with the new versions of the KEGG pathway map viewer and the BRITE hierarchy viewer. Significant enhancements have been made for BRITE mapping, where the mapping result can be examined by manipulation of hierarchical trees, such as pruning and zooming. The tree manipulation feature has also been implemented in the taxonomy mapping tool for linking KO (KEGG Orthology) groups and modules to phenotypes.


Subject(s)
Artificial Intelligence , Computational Biology , Databases, Genetic , Software
4.
Nucleic Acids Res ; 49(D1): D545-D551, 2021 01 08.
Article in English | MEDLINE | ID: mdl-33125081

ABSTRACT

KEGG (https://www.kegg.jp/) is a manually curated resource integrating eighteen databases categorized into systems, genomic, chemical and health information. It also provides KEGG mapping tools, which enable understanding of cellular and organism-level functions from genome sequences and other molecular datasets. KEGG mapping is a predictive method of reconstructing molecular network systems from molecular building blocks based on the concept of functional orthologs. Since the introduction of the KEGG NETWORK database, various diseases have been associated with network variants, which are perturbed molecular networks caused by human gene variants, viruses, other pathogens and environmental factors. The network variation maps are created as aligned sets of related networks showing, for example, how different viruses inhibit or activate specific cellular signaling pathways. The KEGG pathway maps are now integrated with network variation maps in the NETWORK database, as well as with conserved functional units of KEGG modules and reaction modules in the MODULE database. The KO database for functional orthologs continues to be improved and virus KOs are being expanded for better understanding of virus-cell interactions and for enabling prediction of viral perturbations.


Subject(s)
Cells/metabolism , Viruses/metabolism , Apoptosis/genetics , Gene Regulatory Networks , Genome , Humans , Metabolic Networks and Pathways/genetics , Molecular Sequence Annotation
5.
Protein Sci ; 29(1): 28-35, 2020 01.
Article in English | MEDLINE | ID: mdl-31423653

ABSTRACT

KEGG is a reference knowledge base for biological interpretation of large-scale molecular datasets, such as genome and metagenome sequences. It accumulates experimental knowledge about high-level functions of the cell and the organism represented in terms of KEGG molecular networks, including KEGG pathway maps, BRITE hierarchies, and KEGG modules. By the process called KEGG mapping, a set of protein coding genes in the genome, for example, can be converted to KEGG molecular networks enabling interpretation of cellular functions and other high-level features. Here we report a new version of KEGG Mapper, a suite of KEGG mapping tools available at the KEGG website (https://www.kegg.jp/ or https://www.genome.jp/kegg/), together with the KOALA family tools for automatic assignment of KO (KEGG Orthology) identifiers used in the mapping.


Subject(s)
Computational Biology/methods , Proteins/genetics , Proteins/metabolism , Amino Acid Sequence , Databases, Protein , Molecular Sequence Annotation , Protein Interaction Mapping
6.
Bioinformatics ; 36(7): 2251-2252, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31742321

ABSTRACT

SUMMARY: KofamKOALA is a web server to assign KEGG Orthologs (KOs) to protein sequences by homology search against a database of profile hidden Markov models (KOfam) with pre-computed adaptive score thresholds. KofamKOALA is faster than existing KO assignment tools with its accuracy being comparable to the best performing tools. Function annotation by KofamKOALA helps linking genes to KEGG resources such as the KEGG pathway maps and facilitates molecular network reconstruction. AVAILABILITY AND IMPLEMENTATION: KofamKOALA, KofamScan and KOfam are freely available from GenomeNet (https://www.genome.jp/tools/kofamkoala/). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Computers , Amino Acid Sequence , Databases, Factual
7.
Protein Sci ; 28(11): 1947-1951, 2019 11.
Article in English | MEDLINE | ID: mdl-31441146

ABSTRACT

In this era of high-throughput biology, bioinformatics has become a major discipline for making sense out of large-scale datasets. Bioinformatics is usually considered as a practical field developing databases and software tools for supporting other fields, rather than a fundamental scientific discipline for uncovering principles of biology. The KEGG resource that we have been developing is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is now one of the most utilized biological databases because of its practical values. For me personally, KEGG is a step toward understanding the origin and evolution of cellular organisms.


Subject(s)
Computational Biology , Databases, Genetic , High-Throughput Screening Assays , Humans , Software
8.
Nucleic Acids Res ; 47(D1): D590-D595, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30321428

ABSTRACT

KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp/ or https://www.genome.jp/kegg/) is a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It is an integrated database consisting of three generic categories of systems information, genomic information and chemical information, and an additional human-specific category of health information. KEGG pathway maps, BRITE hierarchies and KEGG modules have been developed as generic molecular networks with KEGG Orthology nodes of functional orthologs so that KEGG pathway mapping and other procedures can be applied to any cellular organism. Unfortunately, however, this generic approach was inadequate for knowledge representation in the health information category, where variations of human genomes, especially disease-related variations, had to be considered. Thus, we have introduced a new approach where human gene variants are explicitly incorporated into what we call 'network variants' in the recently released KEGG NETWORK database. This allows accumulation of knowledge about disease-related perturbed molecular networks caused not only by gene variants, but also by viruses and other pathogens, environmental factors and drugs. We expect that KEGG NETWORK will become another reference knowledge base for the basic understanding of disease mechanisms and practical use in clinical sequencing and drug development.


Subject(s)
Databases, Genetic , Genetic Variation , Genome-Wide Association Study/methods , Genomics/methods , Genome , Humans , Software
9.
Methods Mol Biol ; 1807: 225-239, 2018.
Article in English | MEDLINE | ID: mdl-30030815

ABSTRACT

The KEGG database is widely used as a reference knowledge base for biological interpretation of genome sequences and other high-throughput data. It contains, among others, KEGG pathway maps and BRITE hierarchies (ontologies) representing high-level systemic functions of the cell and the organism. By the processes called pathway mapping and BRITE mapping, information encoded in the genome, especially the repertoire of genes, is converted to such high-level functional information. This general methodology can be applied to microbial genomes to infer antimicrobial resistance (AMR), which is becoming an increasingly serious threat to the global public health. Here we present how knowledge on AMR is accumulated in the KEGG Pathogen resource and how such knowledge can be utilized by BlastKOALA and other web tools.


Subject(s)
Anti-Bacterial Agents/pharmacology , Databases, Genetic , Drug Resistance, Bacterial/genetics , Genome, Bacterial , Carbapenems/pharmacology , Drug Resistance, Bacterial/drug effects , Phylogeny , beta-Lactamases/metabolism , beta-Lactams/pharmacology
10.
Methods Mol Biol ; 1611: 135-145, 2017.
Article in English | MEDLINE | ID: mdl-28451977

ABSTRACT

KEGG is an integrated database resource for linking sequences to biological functions from molecular to higher levels. Knowledge on molecular functions is stored in the KO (KEGG Orthology) database, while cellular- and organism-level functions are represented in the PATHWAY and MODULE databases. Genes in the complete genomes, which are stored in the GENES database, are given KO identifiers by the internal annotation procedure, enabling reconstruction of KEGG pathways and modules for interpretation of higher-level functions. This is possible because all the KEGG pathways and modules are represented as networks of KO nodes. Here we present knowledge-based prediction methods for functional characterization of amino acid sequences using the KEGG resource. Specifically we show how the tools available at the KEGG website including BlastKOALA and KEGG Mapper can be utilized for enzyme annotation and metabolic reconstruction.


Subject(s)
Computational Biology/methods , Databases, Genetic , Genomics/methods
11.
PLoS One ; 12(4): e0176530, 2017.
Article in English | MEDLINE | ID: mdl-28445522

ABSTRACT

Genome-wide scans for positive selection have become important for genomic medicine, and many studies aim to find genomic regions affected by positive selection that are associated with risk allele variations among populations. Most such studies are designed to detect recent positive selection. However, we hypothesize that ancient positive selection is also important for adaptation to pathogens, and has affected current immune-mediated common diseases. Based on this hypothesis, we developed a novel linkage disequilibrium-based pipeline, which aims to detect regions associated with ancient positive selection across populations from single nucleotide polymorphism (SNP) data. By applying this pipeline to the genotypes in the International HapMap project database, we show that genes in the detected regions are enriched in pathways related to the immune system and infectious diseases. The detected regions also contain SNPs reported to be associated with cancers and metabolic diseases, obesity-related traits, type 2 diabetes, and allergic sensitization. These SNPs were further mapped to biological pathways to determine the associations between phenotypes and molecular functions. Assessments of candidate regions to identify functions associated with variations in incidence rates of these diseases are needed in the future.


Subject(s)
Genome, Human , Genome-Wide Association Study , Databases, Genetic , Genetics, Population , Genotype , HapMap Project , Haplotypes , Humans , Linkage Disequilibrium , Metabolic Diseases/genetics , Metabolic Diseases/pathology , Monte Carlo Method , Multigene Family , Neoplasms/genetics , Neoplasms/pathology , Neurodegenerative Diseases/genetics , Neurodegenerative Diseases/pathology , Phenotype , Polymorphism, Single Nucleotide
12.
Nucleic Acids Res ; 45(D1): D353-D361, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899662

ABSTRACT

KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an encyclopedia of genes and genomes. Assigning functional meanings to genes and genomes both at the molecular and higher levels is the primary objective of the KEGG database project. Molecular-level functions are stored in the KO (KEGG Orthology) database, where each KO is defined as a functional ortholog of genes and proteins. Higher-level functions are represented by networks of molecular interactions, reactions and relations in the forms of KEGG pathway maps, BRITE hierarchies and KEGG modules. In the past the KO database was developed for the purpose of defining nodes of molecular networks, but now the content has been expanded and the quality improved irrespective of whether or not the KOs appear in the three molecular network databases. The newly introduced addendum category of the GENES database is a collection of individual proteins whose functions are experimentally characterized and from which an increasing number of KOs are defined. Furthermore, the DISEASE and DRUG databases have been improved by systematic analysis of drug labels for better integration of diseases and drugs with the KEGG molecular networks. KEGG is moving towards becoming a comprehensive knowledge base for both functional interpretation and practical application of genomic information.


Subject(s)
Computational Biology/methods , Databases, Genetic , Genomics/methods , Drug Discovery , Metabolic Networks and Pathways , Web Browser
13.
J Chem Inf Model ; 56(3): 510-6, 2016 Mar 28.
Article in English | MEDLINE | ID: mdl-26822930

ABSTRACT

Although there are several databases that contain data on many metabolites and reactions in biochemical pathways, there is still a big gap in the numbers between experimentally identified enzymes and metabolites. It is supposed that many catalytic enzyme genes are still unknown. Although there are previous studies that estimate the number of candidate enzyme genes, these studies required some additional information aside from the structures of metabolites such as gene expression and order in the genome. In this study, we developed a novel method to identify a candidate enzyme gene of a reaction using the chemical structures of the substrate-product pair (reactant pair). The proposed method is based on a search for similar reactant pairs in a reference database and offers ortholog groups that possibly mediate the given reaction. We applied the proposed method to two experimentally validated reactions. As a result, we confirmed that the histidine transaminase was correctly identified. Although our method could not directly identify the asparagine oxo-acid transaminase, we successfully found the paralog gene most similar to the correct enzyme gene. We also applied our method to infer candidate enzyme genes in the mesaconate pathway. The advantage of our method lies in the prediction of possible genes for orphan enzyme reactions where any associated gene sequences are not determined yet. We believe that this approach will facilitate experimental identification of genes for orphan enzymes.


Subject(s)
Enzymes/genetics , Databases, Protein , Enzymes/metabolism , Substrate Specificity
14.
Methods Mol Biol ; 1374: 55-70, 2016.
Article in English | MEDLINE | ID: mdl-26519400

ABSTRACT

In the era of high-throughput biology it is necessary to develop not only elaborate computational methods but also well-curated databases that can be used as reference for data interpretation. KEGG ( http://www.kegg.jp/ ) is such a reference knowledge base with two specific aims. One is to compile knowledge on high-level functions of the cell and the organism in terms of the molecular interaction and reaction networks, which is implemented in KEGG pathway maps, BRITE functional hierarchies, and KEGG modules. The other is to expand knowledge on genes and proteins involved in the molecular networks from experimentally observed organisms to other organisms using the concept of orthologs, which is implemented in the KEGG Orthology (KO) system. Thus, KEGG is a generic resource applicable to all organisms and enables interpretation of high-level functions from genomic and molecular data. Here we first present a brief overview of the entire KEGG resource, and then give an introduction of how to use KEGG in plant genomics and metabolomics research.


Subject(s)
Computational Biology/methods , Genomics/methods , Metabolomics/methods , Plants/genetics , Plants/metabolism , Databases, Genetic , Web Browser
15.
Nucleic Acids Res ; 44(D1): D457-62, 2016 Jan 04.
Article in English | MEDLINE | ID: mdl-26476454

ABSTRACT

KEGG (http://www.kegg.jp/ or http://www.genome.jp/kegg/) is an integrated database resource for biological interpretation of genome sequences and other high-throughput data. Molecular functions of genes and proteins are associated with ortholog groups and stored in the KEGG Orthology (KO) database. The KEGG pathway maps, BRITE hierarchies and KEGG modules are developed as networks of KO nodes, representing high-level functions of the cell and the organism. Currently, more than 4000 complete genomes are annotated with KOs in the KEGG GENES database, which can be used as a reference data set for KO assignment and subsequent reconstruction of KEGG pathways and other molecular networks. As an annotation resource, the following improvements have been made. First, each KO record is re-examined and associated with protein sequence data used in experiments of functional characterization. Second, the GENES database now includes viruses, plasmids, and the addendum category for functionally characterized proteins that are not represented in complete genomes. Third, new automatic annotation servers, BlastKOALA and GhostKOALA, are made available utilizing the non-redundant pangenome data set generated from the GENES database. As a resource for translational bioinformatics, various data sets are created for antimicrobial resistance and drug interaction networks.


Subject(s)
Amino Acid Sequence , Databases, Genetic , Genes , Molecular Sequence Annotation , Drug Resistance, Microbial , Genome , Metabolic Networks and Pathways , Plasmids/genetics , Proteins/genetics , Viruses/genetics
16.
J Mol Biol ; 428(4): 726-731, 2016 Feb 22.
Article in English | MEDLINE | ID: mdl-26585406

ABSTRACT

BlastKOALA and GhostKOALA are automatic annotation servers for genome and metagenome sequences, which perform KO (KEGG Orthology) assignments to characterize individual gene functions and reconstruct KEGG pathways, BRITE hierarchies and KEGG modules to infer high-level functions of the organism or the ecosystem. Both servers are made freely available at the KEGG Web site (http://www.kegg.jp/blastkoala/). In BlastKOALA, the KO assignment is performed by a modified version of the internally used KOALA algorithm after the BLAST search against a non-redundant dataset of pangenome sequences at the species, genus or family level, which is generated from the KEGG GENES database by retaining the KO content of each taxonomic category. In GhostKOALA, which utilizes more rapid GHOSTX for database search and is suitable for metagenome annotation, the pangenome dataset is supplemented with Cd-hit clusters including those for viral genes. The result files may be downloaded and manipulated for further KEGG Mapper analysis, such as comparative pathway analysis using multiple BlastKOALA results.


Subject(s)
Computational Biology/methods , Genome , Metagenome , Sequence Analysis, DNA/methods , Internet
18.
Methods Mol Biol ; 1273: 97-107, 2015.
Article in English | MEDLINE | ID: mdl-25753705

ABSTRACT

This chapter describes the KEGG GLYCAN database of the KEGG resource, including descriptions of links to the other databases in KEGG. In particular, KEGG GLYCAN consists of glycan structures, with links to glycogenes, orthologs, reactions, pathways, drugs, diseases, and others, all within the KEGG resources. A number of analytical tools are also available, including the composite structure map (CSM), KegDraw, KCam, and GECS. These databases and tools will be described along with simple examples of their usage.


Subject(s)
Databases, Factual , Glycomics/methods , Polysaccharides/chemistry , Carbohydrate Sequence , Gene Expression
19.
J Bioinform Comput Biol ; 12(6): 1442001, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25385078

ABSTRACT

Genomics is faced with the issue of many partially annotated putative enzyme-encoding genes for which activities have not yet been verified, while metabolomics is faced with the issue of many putative enzyme reactions for which full equations have not been verified. Knowledge of enzymes has been collected by IUBMB, and has been made public as the Enzyme List. To date, however, the terminology of the Enzyme List has not been assessed comprehensively by bioinformatics studies. Instead, most of the bioinformatics studies simply use the identifiers of the enzymes, i.e. the Enzyme Commission (EC) numbers. We investigated the actual usage of terminology throughout the Enzyme List, and demonstrated that the partial characteristics of reactions cannot be retrieved by simply using EC numbers. Thus, we developed a novel ontology, named PIERO, for annotating biochemical transformations as follows. First, the terminology describing enzymatic reactions was retrieved from the Enzyme List, and was grouped into those related to overall reactions and biochemical transformations. Consequently, these terms were mapped onto the actual transformations taken from enzymatic reaction equations. This ontology was linked to Gene Ontology (GO) and EC numbers, allowing the extraction of common partial reaction characteristics from given sets of orthologous genes and the elucidation of possible enzymes from the given transformations. Further future development of the PIERO ontology should enhance the Enzyme List to promote the integration of genomics and metabolomics.


Subject(s)
Biological Ontologies , Databases, Protein , Enzymes/chemistry , Enzymes/classification , Information Storage and Retrieval/methods , Terminology as Topic , Enzymes/genetics , Natural Language Processing
20.
Nucleic Acids Res ; 42(Web Server issue): W39-45, 2014 Jul.
Article in English | MEDLINE | ID: mdl-24838565

ABSTRACT

DINIES (drug-target interaction network inference engine based on supervised analysis) is a web server for predicting unknown drug-target interaction networks from various types of biological data (e.g. chemical structures, drug side effects, amino acid sequences and protein domains) in the framework of supervised network inference. The originality of DINIES lies in prediction with state-of-the-art machine learning methods, in the integration of heterogeneous biological data and in compatibility with the KEGG database. The DINIES server accepts any 'profiles' or precalculated similarity matrices (or 'kernels') of drugs and target proteins in tab-delimited file format. When a training data set is submitted to learn a predictive model, users can select either known interaction information in the KEGG DRUG database or their own interaction data. The user can also select an algorithm for supervised network inference, select various parameters in the method and specify weights for heterogeneous data integration. The server can provide integrative analyses with useful components in KEGG, such as biological pathways, functional hierarchy and human diseases. DINIES (http://www.genome.jp/tools/dinies/) is publicly available as one of the genome analysis tools in GenomeNet.


Subject(s)
Artificial Intelligence , Drug Discovery , Proteins/chemistry , Software , Algorithms , Humans , Internet , Pharmaceutical Preparations/chemistry , Protein Structure, Tertiary , Proteins/drug effects , Sequence Analysis, Protein
SELECTION OF CITATIONS
SEARCH DETAIL
...