Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 15 de 15
Filter
Add more filters










Publication year range
1.
DNA Res ; 30(4)2023 Aug 01.
Article in English | MEDLINE | ID: mdl-37478310

ABSTRACT

The prediction of gene structure within the genome sequence is the starting point of genome analysis, and its accuracy has a significant impact on the quality of subsequent analyses. Gene structure prediction is roughly divided into RNA-Seq-based methods, ab initio-based methods, homology-based methods, and the integration of individual prediction methods. Integrated methods are mainstream in recent genome projects because they improve prediction accuracy by combining or taking the best individual prediction findings; however, adequate prediction accuracy for eukaryotic species has not yet been achieved. Therefore, we developed an integrated tool, GINGER, that solves various issues related to gene structure prediction in higher eukaryotes. By handling artefacts in alignments of RNA and protein sequences, reconstructing gene structures via dynamic programming with appropriately weighted and scored exon/intron/intergenic regions, and applying different prediction processes and filtering criteria to multi-exon and single-exon genes, we achieved a significant improvement in accuracy compared to the existing integration methods. The feature of GINGER is its high prediction accuracy at the gene and exon levels, which is pronounced for species with more complex gene architectures. GINGER is implemented using Nextflow, which allows for the efficient and effective use of computing resources.


Subject(s)
Zingiber officinale , Zingiber officinale/genetics , Eukaryota , Genome , Exons , Introns , Algorithms , Software
2.
Bioengineering (Basel) ; 7(4)2020 Nov 19.
Article in English | MEDLINE | ID: mdl-33227954

ABSTRACT

Improving the bioproduction ability of efficient host microorganisms is a central aim in bioengineering. To control biosynthesis in living cells, the regulatory system of the whole biosynthetic pathway should be clearly understood. In this study, we applied our network modeling method to infer the regulatory system for triacylglyceride (TAG) biosynthesis in Lipomyces starkeyi, using factor analyses and structural equation modeling to construct a regulatory network model. By factor analysis, we classified 89 TAG biosynthesis-related genes into nine groups, which were considered different regulatory sub-systems. We constructed two different types of regulatory models. One is the regulatory model for oil productivity, and the other is the whole regulatory model for TAG biosynthesis. From the inferred oil productivity regulatory model, the well characterized genes DGA1 and ACL1 were detected as regulatory factors. Furthermore, we also found unknown feedback controls in oil productivity regulation. These regulation models suggest that the regulatory factor induction targets should be selected carefully. Within the whole regulatory model of TAG biosynthesis, some genes were detected as not related to TAG biosynthesis regulation. Using network modeling, we reveal that the regulatory system is helpful for the new era of bioengineering.

3.
Biosci Biotechnol Biochem ; 82(9): 1515-1517, 2018 Sep.
Article in English | MEDLINE | ID: mdl-29792119

ABSTRACT

MAPLE is an automated system for inferring the potential comprehensive functions harbored by genomes and metagenomes. To reduce runtime in MAPLE analyzing the massive amino acid datasets of over 1 million sequences, we improved it by adapting the KEGG automatic annotation server to use GHOSTX and verified no substantial difference in the MAPLE results between the original and new implementations.


Subject(s)
Genome , Metagenome , Amino Acids/chemistry , Automation , Computational Biology , Databases, Protein , Datasets as Topic , Software
4.
DNA Res ; 23(5): 467-475, 2016 Oct 01.
Article in English | MEDLINE | ID: mdl-27374611

ABSTRACT

Metabolic and physiological potential evaluator (MAPLE) is an automatic system that can perform a series of steps used in the evaluation of potential comprehensive functions (functionome) harboured in the genome and metagenome. MAPLE first assigns KEGG Orthology (KO) to the query gene, maps the KO-assigned genes to the Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules, and then calculates the module completion ratio (MCR) of each functional module to characterize the potential functionome in the user's own genomic and metagenomic data. In this study, we added two more useful functions to calculate module abundance and Q-value, which indicate the functional abundance and statistical significance of the MCR results, respectively, to the new version of MAPLE for more detailed comparative genomic and metagenomic analyses. Consequently, MAPLE version 2.1.0 reported significant differences in the potential functionome, functional abundance, and diversity of contributors to each function among four metagenomic datasets generated by the global ocean sampling expedition, one of the most popular environmental samples to use with this system. MAPLE version 2.1.0 is now available through the web interface (http://www.genome.jp/tools/maple/) 17 June 2016, date last accessed.

5.
PLoS One ; 10(7): e0132994, 2015.
Article in English | MEDLINE | ID: mdl-26196861

ABSTRACT

In this study, the metabolic and physiological potential evaluator system based on Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules was employed to establish a functional classification of archaeal species and to determine the comprehensive functions (functionome) of the previously uncultivated thermophile "Candidatus Caldiarchaeum subterraneum" (Ca. C. subterraneum). A phylogenetic analysis based on the concatenated sequences of proteins common among 142 archaea and 2 bacteria, and among 137 archaea and 13 unicellular eukaryotes suggested that Ca. C. subterraneum is closely related to thaumarchaeotic species. Consistent with the results of the phylogenetic analysis, clustering and principal component analyses based on the completion ratio patterns for all KEGG modules in 79 archaeal species suggested that the overall metabolic and physiological potential of Ca. C. subterraneum is similar to that of thaumarchaeotic species. However, Ca. C. subterraneum possessed almost no genes in the modules required for nitrification and the hydroxypropionate-hydroxybutyrate cycle for carbon fixation, unlike thaumarchaeotic species. However, it possessed all genes in the modules required for central carbohydrate metabolism, such as glycolysis, pyruvate oxidation, the tricarboxylic acid (TCA) cycle, and the glyoxylate cycle, as well as multiple sets of sugar and branched chain amino acid ABC transporters. These metabolic and physiological features appear to support the predominantly aerobic character of Ca. C. subterraneum, which lives in a subsurface thermophilic microbial mat community with a heterotrophic lifestyle.


Subject(s)
Archaea/classification , Archaea/physiology , Phylogeny , Ammonia/chemistry , Archaea/genetics , Bacteria/genetics , Bacteria/metabolism , Carbohydrates/chemistry , Carbon Cycle , Cluster Analysis , Computational Biology/methods , Databases, Genetic , Genomics , Nitrification , Phenotype , Principal Component Analysis , Species Specificity
6.
Plant Cell ; 27(1): 162-76, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25634988

ABSTRACT

Oleaginous photosynthetic organisms such as microalgae are promising sources for biofuel production through the generation of carbon-neutral sustainable energy. However, the metabolic mechanisms driving high-rate lipid production in these oleaginous organisms remain unclear, thus impeding efforts to improve productivity through genetic modifications. We analyzed the genome and transcriptome of the oleaginous diatom Fistulifera solaris JPCC DA0580. Next-generation sequencing technology provided evidence of an allodiploid genome structure, suggesting unorthodox molecular evolutionary and genetic regulatory systems for reinforcing metabolic efficiencies. Although major metabolic pathways were shared with nonoleaginous diatoms, transcriptome analysis revealed unique expression patterns, such as concomitant upregulation of fatty acid/triacylglycerol biosynthesis and fatty acid degradation (ß-oxidation) in concert with ATP production. This peculiar pattern of gene expression may account for the simultaneous growth and oil accumulation phenotype and may inspire novel biofuel production technology based on this oleaginous microalga.


Subject(s)
Diatoms/genetics , Fatty Acids/metabolism , Genome, Plant/genetics , Transcriptome/genetics , Triglycerides/metabolism
7.
PLoS One ; 9(9): e107629, 2014.
Article in English | MEDLINE | ID: mdl-25268590

ABSTRACT

Fistulifera sp. strain JPCC DA0580 is a newly sequenced pennate diatom that is capable of simultaneously growing and accumulating lipids. This is a unique trait, not found in other related microalgae so far. It is able to accumulate between 40 to 60% of its cell weight in lipids, making it a strong candidate for the production of biofuel. To investigate this characteristic, we used RNA-Seq data gathered at four different times while Fistulifera sp. strain JPCC DA0580 was grown in oil accumulating and non-oil accumulating conditions. We then adapted gene set enrichment analysis (GSEA) to investigate the relationship between the difference in gene expression of 7,822 genes and metabolic functions in our data. We utilized information in the KEGG pathway database to create the gene sets and changed GSEA to use re-sampling so that data from the different time points could be included in the analysis. Our GSEA method identified photosynthesis, lipid synthesis and amino acid synthesis related pathways as processes that play a significant role in oil production and growth in Fistulifera sp. strain JPCC DA0580. In addition to GSEA, we visualized the results by creating a network of compounds and reactions, and plotted the expression data on top of the network. This made existing graph algorithms available to us which we then used to calculate a path that metabolizes glucose into triacylglycerol (TAG) in the smallest number of steps. By visualizing the data this way, we observed a separate up-regulation of genes at different times instead of a concerted response. We also identified two metabolic paths that used less reactions than the one shown in KEGG and showed that the reactions were up-regulated during the experiment. The combination of analysis and visualization methods successfully analyzed time-course data, identified important metabolic pathways and provided new hypotheses for further research.


Subject(s)
Gene Expression Profiling/methods , Biofuels , Biosynthetic Pathways/genetics , Diatoms/genetics , Diatoms/metabolism , Gene Regulatory Networks , Lipid Metabolism , Microalgae/genetics , Microalgae/metabolism , Transcriptome
8.
BMC Genomics ; 13: 699, 2012 Dec 12.
Article in English | MEDLINE | ID: mdl-23234305

ABSTRACT

BACKGROUND: One of the main goals of genomic analysis is to elucidate the comprehensive functions (functionome) in individual organisms or a whole community in various environments. However, a standard evaluation method for discerning the functional potentials harbored within the genome or metagenome has not yet been established. We have developed a new evaluation method for the potential functionome, based on the completion ratio of Kyoto Encyclopedia of Genes and Genomes (KEGG) functional modules. RESULTS: Distribution of the completion ratio of the KEGG functional modules in 768 prokaryotic species varied greatly with the kind of module, and all modules primarily fell into 4 patterns (universal, restricted, diversified and non-prokaryotic modules), indicating the universal and unique nature of each module, and also the versatility of the KEGG Orthology (KO) identifiers mapped to each one. The module completion ratio in 8 phenotypically different bacilli revealed that some modules were shared only in phenotypically similar species. Metagenomes of human gut microbiomes from 13 healthy individuals previously determined by the Sanger method were analyzed based on the module completion ratio. Results led to new discoveries in the nutritional preferences of gut microbes, believed to be one of the mutualistic representations of gut microbiomes to avoid nutritional competition with the host. CONCLUSIONS: The method developed in this study could characterize the functionome harbored in genomes and metagenomes. As this method also provided taxonomical information from KEGG modules as well as the gene hosts constructing the modules, interpretation of completion profiles was simplified and we could identify the complementarity between biochemical functions in human hosts and the nutritional preferences in human gut microbiomes. Thus, our method has the potential to be a powerful tool for comparative functional analysis in genomics and metagenomics, able to target unknown environments containing various uncultivable microbes within unidentified phyla.


Subject(s)
Computational Biology/methods , Databases, Genetic , Gastrointestinal Tract/microbiology , Genomics/methods , Metagenome/genetics , Proteins/physiology , Humans , Proteins/genetics , Species Specificity
9.
Database (Oxford) ; 2011: bar046, 2011.
Article in English | MEDLINE | ID: mdl-22039163

ABSTRACT

CELLPEDIA is a repository database for current knowledge about human cells. It contains various types of information, such as cell morphologies, gene expression and literature references. The major role of CELLPEDIA is to provide a digital dictionary of human cells for the biomedical field, including support for the characterization of artificially generated cells in regenerative medicine. CELLPEDIA features (i) its own cell classification scheme, in which whole human cells are classified by their physical locations in addition to conventional taxonomy; and (ii) cell differentiation pathways compiled from biomedical textbooks and journal papers. Currently, human differentiated cells and stem cells are classified into 2260 and 66 cell taxonomy keys, respectively, from which 934 parent-child relationships reported in cell differentiation or transdifferentiation pathways are retrievable. As far as we know, this is the first attempt to develop a digital cell bank to function as a public resource for the accumulation of current knowledge about human cells. The CELLPEDIA homepage is freely accessible except for the data submission pages that require authentication (please send a password request to cell-info@cbrc.jp). Database URL: http://cellpedia.cbrc.jp/


Subject(s)
Cell Physiological Phenomena , Cells/classification , Database Management Systems , Databases, Factual , Cell Differentiation , Humans , User-Computer Interface
10.
Genome Inform ; 25(1): 53-60, 2011.
Article in English | MEDLINE | ID: mdl-22230939

ABSTRACT

We developed linear regression models which predict strength of transcriptional activity of promoters from their sequences. Intrinsic transcriptional strength data of 451 human promoter sequences in three cell lines (HEK293, MCF7 and 3T3), which were measured by systematic luciferase reporter gene assays, were used to build the models. The models sum up contributions of CG dinucleotide content and transcription factor binding sites (TFBSs) to transcriptional strength. We evaluated prediction accuracies of the models by cross validation tests and found that they have adequate ability for predicting transcriptional strength of promoters in spite of their simple formalization. We also evaluated statistical significance of the contributions and proposed a picture of regulatory code hidden in promoter sequences. That is, CG dinucleotide content and TFBSs mainly determine strength of transcriptional activity under ubiquitous and specific environments, respectively.


Subject(s)
Models, Genetic , Promoter Regions, Genetic , Transcription, Genetic , 3T3 Cells , Animals , Base Composition , Binding Sites , HEK293 Cells , Humans , Linear Models , MCF-7 Cells , Mice , Transcription Factors/metabolism
11.
PLoS One ; 5(8): e11881, 2010 Aug 27.
Article in English | MEDLINE | ID: mdl-20806061

ABSTRACT

How to identify true transcription factor binding sites on the basis of sequence motif information (e.g., motif pattern, location, combination, etc.) is an important question in bioinformatics. We present "PeakRegressor," a system that identifies binding motifs by combining DNA-sequence data and ChIP-Seq data. PeakRegressor uses L1-norm log linear regression in order to predict peak values from binding motif candidates. Our approach successfully predicts the peak values of STAT1 and RNA Polymerase II with correlation coefficients as high as 0.65 and 0.66, respectively. Using PeakRegressor, we could identify composite motifs for STAT1, as well as potential regulatory SNPs (rSNPs) involved in the regulation of transcription levels of neighboring genes. In addition, we show that among five regression methods, L1-norm log linear regression achieves the best performance with respect to binding motif identification, biological interpretability and computational efficiency.


Subject(s)
Computational Biology , Polymorphism, Single Nucleotide/genetics , Regulatory Sequences, Nucleic Acid/genetics , Repetitive Sequences, Nucleic Acid/genetics , STAT1 Transcription Factor/metabolism , Base Sequence , Binding Sites , Linear Models , Principal Component Analysis , RNA Polymerase II/metabolism
12.
J Toxicol Sci ; 35(1): 115-23, 2010 Feb.
Article in English | MEDLINE | ID: mdl-20118632

ABSTRACT

Profiles of Chemical Effects on Cells (pCEC) is a toxicogenomics database with a system of classifying chemicals that have effects on human health. This database stores and handles gene expression profiling information and categories of toxicity data. Chemicals are classified according to the specific tissues and cells they affect, the gene expression changes they induce, their toxicity and biological functions in this database system. The pCEC system also analyzes relationships between chemicals and the genes they affect in specific tissues and cells. The reason why we developed pCEC is to support decision-making within the context of environmental regulation. Especially, exposure to environmental chemicals during fetal and newborn development may result in a predisposition to various disorders such as cancer, learning disabilities and allergies later in life. The identification and prediction of hazardous chemicals using limited information are important issues in human health risk management. Therefore, various toxicity information including lethal dose 50 (LD50), toxicity pathways and pathological data were loaded into pCEC. pCEC is also a facility for query, analysis and prediction of unknown toxicochemical reaction pathways and biomarkers which are based on toxicoinformatical data mining approaches. This database is available online at http://project.nies.go.jp/eCA/cgi-bin/index.cgi. The current version of the database has information on the hepatotoxicity, reproductive toxicity and embryotoxicity of chemicals.


Subject(s)
Databases as Topic , Environmental Pollutants/toxicity , Risk Assessment/methods , Toxicogenetics , Animals , Computational Biology , Databases, Factual , Environmental Pollutants/classification , Gene Expression Profiling , Humans , Lethal Dose 50 , Predictive Value of Tests , Protein Array Analysis
13.
Methods Mol Biol ; 577: 55-65, 2009.
Article in English | MEDLINE | ID: mdl-19718508

ABSTRACT

Gene clustering is one of the main themes of data mining approaches in bioinformatics. Although it has the power to analyze gene function, interpretation of the results becomes increasingly difficult when the number of experiments (samples) exceeds hundreds or more. A new type of clustering called "biclustering," where genes and experiments are coclustered in a large-scale of gene expression data, has been extensively studied in the last decade. We have developed "SAMURAI," an original program that detects all the biclusters or "gene modules" whose genes have similar expression patterns to query profile using the ultrafast data mining algorithm called Linear-time Closed itemset Miner (LCM). Using chemical toxicity dataset from J&J rat liver experiments, we compiled an exhaustive dictionary of gene modules by searching datasets of gene modules with each chemical exposure experiment as query. Through the module analysis, we found that our program can detect up/down-regulated gene sets that significantly represent particular GO functions or KEGG pathways, thereby unraveling reactions and mechanisms common to different toxicochemical treatments of hepatocytes.


Subject(s)
Gene Expression Profiling/statistics & numerical data , Toxicology/statistics & numerical data , Algorithms , Animals , Cluster Analysis , Computational Biology , Databases, Factual , Liver/drug effects , Liver/metabolism , Molecular Biology/methods , Rats
14.
DNA Res ; 15(6): 387-96, 2008 Dec.
Article in English | MEDLINE | ID: mdl-18940874

ABSTRACT

Recent advances in DNA sequencers are accelerating genome sequencing, especially in microbes, and complete and draft genomes from various species have been sequenced in rapid succession. Here, we present a comprehensive gene prediction tool, the MetaGeneAnnotator (MGA), which precisely predicts all kinds of prokaryotic genes from a single or a set of anonymous genomic sequences having a variety of lengths. The MGA integrates statistical models of prophage genes, in addition to those of bacterial and archaeal genes, and also uses a self-training model from input sequences for predictions. As a result, the MGA sensitively detects not only typical genes but also atypical genes, such as horizontally transferred and prophage genes in a prokaryotic genome. In this paper, we also propose a novel approach for analyzing the ribosomal binding site (RBS), which enables us to detect species-specific patterns of the RBSs. The MGA has the ingenious RBS model based on this approach, and precisely predicts translation starts of genes. The MGA also succeeds in improving prediction accuracies for short sequences by using the adapted RBS models (96% sensitivity and 93% specificity for 700 bp fragments). These features of the MGA expedite wide ranges of microbial genome studies, such as genome annotations and metagenome analyses.


Subject(s)
Computational Biology/methods , Genes, Bacterial , Genes, Viral , Genome, Bacterial/genetics , Genome, Viral/genetics , Ribosomes/metabolism , Algorithms , Bacteria/genetics , Bacteriophages/genetics , Binding Sites , Plasmids/genetics , Predictive Value of Tests , Protein Biosynthesis , Species Specificity
15.
Bioinformatics ; 23(22): 3103-4, 2007 Nov 15.
Article in English | MEDLINE | ID: mdl-17895274

ABSTRACT

The establishment and rapid expansion of microarray databases has created a need for new search tools. Here we present CellMontage, the first server for expression profile similarity search over a large database-69 000 microarray experiments derived from NCBI's; GEO site. CellMontage provides a novel, content-based search engine for accessing gene expression data. Microarray experiments with similar overall expression to a user-provided expression profile (e.g. microarray experiment) are computed and displayed-usually within 20 s. The core search engine software is downloadable from the site.


Subject(s)
Database Management Systems , Databases, Protein , Gene Expression Profiling/methods , Information Storage and Retrieval/methods , Internet , Oligonucleotide Array Sequence Analysis/methods , Proteins/chemistry , Proteins/metabolism , User-Computer Interface , Algorithms , Proteins/classification , Software
SELECTION OF CITATIONS
SEARCH DETAIL
...