Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 22
Filter
Add more filters










Publication year range
1.
Cancer Res ; 80(5): 1143-1155, 2020 03 01.
Article in English | MEDLINE | ID: mdl-31932456

ABSTRACT

Considerable metabolic reprogramming has been observed in a conserved manner across multiple cancer types, but their true causes remain elusive. We present an analysis of around 50 such reprogrammed metabolisms (RM) including the Warburg effect, nucleotide de novo synthesis, and sialic acid biosynthesis in cancer. Analyses of the biochemical reactions conducted by these RMs, coupled with gene expression data of their catalyzing enzymes, in 7,011 tissues of 14 cancer types, revealed that all RMs produce more H+ than their original metabolisms. These data strongly support a model that these RMs are induced or selected to neutralize a persistent intracellular alkaline stress due to chronic inflammation and local iron overload. To sustain these RMs for survival, cells must find metabolic exits for the nonproton products of these RMs in a continuous manner, some of which pose major challenges, such as nucleotides and sialic acids, because they are electrically charged. This analysis strongly suggests that continuous cell division and other cancerous behaviors are ways for the affected cells to remove such products in a timely and sustained manner. As supporting evidence, this model can offer simple and natural explanations to a range of long-standing open questions in cancer research including the cause of the Warburg effect. SIGNIFICANCE: Inhibiting acidifying metabolic reprogramming could be a novel strategy for treating cancer.


Subject(s)
Energy Metabolism , Glycolysis , Mitochondria/pathology , Neoplasms/pathology , Protons , Cell Proliferation , Cell Survival , Cytosol/pathology , Female , Humans , Male , Metabolic Networks and Pathways , N-Acetylneuraminic Acid/biosynthesis , Nucleotides/biosynthesis , RNA-Seq
2.
PLoS One ; 9(6): e98844, 2014.
Article in English | MEDLINE | ID: mdl-24892935

ABSTRACT

A challenge in phylogenetic inference of gene trees is how to properly sample a large pool of homologous sequences to derive a good representative subset of sequences. Such a need arises in various applications, e.g. when (1) accuracy-oriented phylogenetic reconstruction methods may not be able to deal with a large pool of sequences due to their high demand in computing resources; (2) applications analyzing a collection of gene trees may prefer to use trees with fewer operational taxonomic units (OTUs), for instance for the detection of horizontal gene transfer events by identifying phylogenetic conflicts; and (3) the pool of available sequences is biased towards extensively studied species. In the past, the creation of subsamples often relied on manual selection. Here we present an Automated sequence-Sampling method for improving the Taxonomic diversity of gene phylogenetic trees, AST, to obtain representative sequences that maximize the taxonomic diversity of the sampled sequences. To demonstrate the effectiveness of AST, we have tested it to solve four problems, namely, inference of the evolutionary histories of the small ribosomal subunit protein S5 of E. coli, 16 S ribosomal RNAs and glycosyl-transferase gene family 8, and a study of ancient horizontal gene transfers from bacteria to plants. Our results show that the resolution of our computational results is almost as good as that of manual inference by domain experts, hence making the tool generally useful to phylogenetic studies by non-phylogeny specialists. The program is available at http://csbl.bmb.uga.edu/~zhouchan/AST.php.


Subject(s)
Computational Biology/methods , Phylogeny , RNA, Ribosomal/genetics , Sequence Analysis, DNA
3.
Nucleic Acids Res ; 42(Database issue): D654-9, 2014 Jan.
Article in English | MEDLINE | ID: mdl-24214966

ABSTRACT

We have recently developed a new version of the DOOR operon database, DOOR 2.0, which is available online at http://csbl.bmb.uga.edu/DOOR/ and will be updated on a regular basis. DOOR 2.0 contains genome-scale operons for 2072 prokaryotes with complete genomes, three times the number of genomes covered in the previous version published in 2009. DOOR 2.0 has a number of new features, compared with its previous version, including (i) more than 250,000 transcription units, experimentally validated or computationally predicted based on RNA-seq data, providing a dynamic functional view of the underlying operons; (ii) an integrated operon-centric data resource that provides not only operons for each covered genome but also their functional and regulatory information such as their cis-regulatory binding sites for transcription initiation and termination, gene expression levels estimated based on RNA-seq data and conservation information across multiple genomes; (iii) a high-performance web service for online operon prediction on user-provided genomic sequences; (iv) an intuitive genome browser to support visualization of user-selected data; and (v) a keyword-based Google-like search engine for finding the needed information intuitively and rapidly in this database.


Subject(s)
Bacteria/genetics , Databases, Genetic , Operon , Genome, Archaeal , Genome, Bacterial , Internet , Regulatory Elements, Transcriptional , Transcription, Genetic
4.
PLoS One ; 8(8): e71177, 2013.
Article in English | MEDLINE | ID: mdl-23967163

ABSTRACT

The rapid growth of cancer cells fueled by glycolysis produces large amounts of protons in cancer cells, which tri mechanisms to transport them out, hence leading to increased acidity in their extracellular environments. It has been well established that the increased acidity will induce cell death of normal cells but not cancer cells. The main question we address here is: how cancer cells deal with the increased acidity to avoid the activation of apoptosis. We have carried out a comparative analysis of transcriptomic data of six solid cancer types, breast, colon, liver, two lung (adenocarcinoma, squamous cell carcinoma) and prostate cancers, and proposed a model of how cancer cells utilize a few mechanisms to keep the protons outside of the cells. The model consists of a number of previously, well or partially, studied mechanisms for transporting out the excess protons, such as through the monocarboxylate transporters, V-ATPases, NHEs and the one facilitated by carbonic anhydrases. In addition we propose a new mechanism that neutralizes protons through the conversion of glutamate to γ-aminobutyrate, which consumes one proton per reaction. We hypothesize that these processes are regulated by cancer related conditions such as hypoxia and growth factors and by the pH levels, making these encoded processes not available to normal cells under acidic conditions.


Subject(s)
Gene Expression Profiling , Neoplasms/genetics , Neoplasms/pathology , Carbonic Anhydrases/metabolism , Humans , Hydrogen-Ion Concentration , Intracellular Space/metabolism , Neoplasms/metabolism , Protons , Sodium/metabolism , Vacuolar Proton-Translocating ATPases/metabolism
5.
PLoS One ; 8(6): e66817, 2013.
Article in English | MEDLINE | ID: mdl-23825567

ABSTRACT

Systematic determination of gene function is an essential step in fully understanding the precise contribution of each gene for the proper execution of molecular functions in the cell. Gene functional linkage is defined as to describe the relationship of a group of genes with similar functions. With thousands of genomes sequenced, there arises a great opportunity to utilize gene evolutionary information to identify gene functional linkages. To this end, we established a computational method (called TRACE) to trace gene footprints through a gene functional network constructed from 341 prokaryotic genomes. TRACE performance was validated and successfully tested to predict enzyme functions as well as components of pathway. A so far undescribed chromosome partitioning-like protein ro03654 of an oleaginous bacteria Rhodococcus sp. RHA1 (RHA1) was predicted and verified experimentally with its deletion mutant showing growth inhibition compared to RHA1 wild type. In addition, four proteins were predicted to act as prokaryotic SNARE-like proteins, and two of them were shown to be localized at the plasma membrane. Thus, we believe that TRACE is an effective new method to infer prokaryotic gene functional linkages by tracing evolutionary events.


Subject(s)
Bacteria/genetics , Evolution, Molecular , Genes, Bacterial/genetics , Genetic Linkage , Genomics/methods , Bacterial Proteins/genetics , Chromosomes, Bacterial/genetics , Operon/genetics
6.
PLoS One ; 8(2): e56726, 2013.
Article in English | MEDLINE | ID: mdl-23457606

ABSTRACT

We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a) identification of horizontally transferred genes, (b) identification of genomic islands with special properties and (c) binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a) calculation of the k-mer based barcode image for a provided DNA sequence; (b) detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c) clustering of provided DNA sequences into groups having similar barcodes; and (d) homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.


Subject(s)
Computer Graphics , Genomics/methods , Software , Algorithms , Cluster Analysis , Data Mining , Escherichia coli K12/genetics , Escherichia coli O157/genetics , Genomic Islands/genetics , Metagenomics , Sequence Analysis
7.
BMC Bioinformatics ; 13: 123, 2012 Jun 07.
Article in English | MEDLINE | ID: mdl-22676320

ABSTRACT

BACKGROUND: The frequent exchange of genetic material among prokaryotes means that extracting a majority or plurality phylogenetic signal from many gene families, and the identification of gene families that are in significant conflict with the plurality signal is a frequent task in comparative genomics, and especially in phylogenomic analyses. Decomposition of gene trees into embedded quartets (unrooted trees each with four taxa) is a convenient and statistically powerful technique to address this challenging problem. This approach was shown to be useful in several studies of completely sequenced microbial genomes. RESULTS: We present here a web server that takes a collection of gene phylogenies, decomposes them into quartets, generates a Quartet Spectrum, and draws a split network. Users are also provided with various data download options for further analyses. Each gene phylogeny is to be represented by an assessment of phylogenetic information content, such as sets of trees reconstructed from bootstrap replicates or sampled from a posterior distribution. The Quartet Decomposition server is accessible at http://quartets.uga.edu. CONCLUSIONS: The Quartet Decomposition server presented here provides a convenient means to perform Quartet Decomposition analyses and will empower users to find statistically supported phylogenetic conflicts.


Subject(s)
Multigene Family , Phylogeny , Software , Genes, Bacterial , Genomics/methods , Internet
8.
Nucleic Acids Res ; 40(Web Server issue): W445-51, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22645317

ABSTRACT

Carbohydrate-active enzymes (CAZymes) are very important to the biotech industry, particularly the emerging biofuel industry because CAZymes are responsible for the synthesis, degradation and modification of all the carbohydrates on Earth. We have developed a web resource, dbCAN (http://csbl.bmb.uga.edu/dbCAN/annotate.php), to provide a capability for automated CAZyme signature domain-based annotation for any given protein data set (e.g. proteins from a newly sequenced genome) submitted to our server. To accomplish this, we have explicitly defined a signature domain for every CAZyme family, derived based on the CDD (conserved domain database) search and literature curation. We have also constructed a hidden Markov model to represent the signature domain of each CAZyme family. These CAZyme family-specific HMMs are our key contribution and the foundation for the automated CAZyme annotation.


Subject(s)
Carbohydrate Metabolism , Enzymes/chemistry , Molecular Sequence Annotation , Software , Enzyme Activation , Enzymes/classification , Enzymes/metabolism , Internet , Metagenome , Protein Structure, Tertiary , Sequence Alignment
9.
BMC Bioinformatics ; 12 Suppl 1: S1, 2011 Feb 15.
Article in English | MEDLINE | ID: mdl-21342538

ABSTRACT

BACKGROUND: Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called "missing gene" problem. METHODS: We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the mapped pathway model. The basic idea of the algorithm is to identify genes in the target genome whose homologous genes share common operons with homologs of any mapped pathway genes in some reference genome, and to add such genes to the target pathway if their functions are consistent with the cellular function of the target pathway. RESULTS: We have implemented this idea using a graph-theoretic approach and demonstrated the effectiveness of the algorithm on known pathways of E. coli in the KEGG database. On all KEGG pathways containing at least 5 genes, our method achieves an average of 60% positive predictive value (PPV) and the performance is increased with more seed genes added. Analysis shows that our method is highly robust. CONCLUSIONS: An effective method is presented to find missing genes in biological pathways of prokaryotes, which achieves high prediction reliability on E. coli at a genome level. Numerous missing genes are found to be related to known E. coli pathways, which can be further validated through biological experiments. Overall this method is robust and can be used for functional inference.


Subject(s)
Algorithms , Chromosome Mapping/methods , Computational Biology/methods , Genomics/methods , Escherichia coli/genetics , Genome, Archaeal , Genome, Bacterial , Operon , Phylogeny
10.
Article in English | MEDLINE | ID: mdl-19407357

ABSTRACT

Large sets of bioinformatical data provide a challenge in time consumption while solving the cluster identification problem, and that is why a parallel algorithm is so needed for identifying dense clusters in a noisy background. Our algorithm works on a graph representation of the data set to be analyzed. It identifies clusters through the identification of densely intraconnected subgraphs. We have employed a minimum spanning tree (MST) representation of the graph and solve the cluster identification problem using this representation. The computational bottleneck of our algorithm is the construction of an MST of a graph, for which a parallel algorithm is employed. Our high-level strategy for the parallel MST construction algorithm is to first partition the graph, then construct MSTs for the partitioned subgraphs and auxiliary bipartite graphs based on the subgraphs, and finally merge these MSTs to derive an MST of the original graph. The computational results indicate that when running on 150 CPUs, our algorithm can solve a cluster identification problem on a data set with 1,000,000 data points almost 100 times faster than on single CPU, indicating that this program is capable of handling very large data clustering problems in an efficient manner. We have implemented the clustering algorithm as the software CLUMP.


Subject(s)
Algorithms , Cluster Analysis , Computational Biology/methods , Databases, Genetic , Pattern Recognition, Automated/methods , Linear Models , Multigene Family , Reproducibility of Results , Software , Systems Integration
11.
Nucleic Acids Res ; 37(Database issue): D459-63, 2009 Jan.
Article in English | MEDLINE | ID: mdl-18988623

ABSTRACT

We present a database DOOR (Database for prOkaryotic OpeRons) containing computationally predicted operons of all the sequenced prokaryotic genomes. All the operons in DOOR are predicted using our own prediction program, which was ranked to be the best among 14 operon prediction programs by a recent independent review. Currently, the DOOR database contains operons for 675 prokaryotic genomes, and supports a number of search capabilities to facilitate easy access and utilization of the information stored in it. (1) Querying the database: the database provides a search capability for a user to find desired operons and associated information through multiple querying methods. (2) Searching for similar operons: the database provides a search capability for a user to find operons that have similar composition and structure to a query operon. (3) Prediction of cis-regulatory motifs: the database provides a capability for motif identification in the promoter regions of a user-specified group of possibly coregulated operons, using motif-finding tools. (4) Operons for RNA genes: the database includes operons for RNA genes. (5) OperonWiki: the database provides a wiki page (OperonWiki) to facilitate interactions between users and the developer of the database. We believe that DOOR provides a useful resource to many biologists working on bacteria and archaea, which can be accessed at http://csbl1.bmb.uga.edu/OperonDB.


Subject(s)
Databases, Genetic , Genome, Archaeal , Genome, Bacterial , Operon , Genomics , Software
12.
Comput Biol Chem ; 32(3): 176-84, 2008 Jun.
Article in English | MEDLINE | ID: mdl-18440870

ABSTRACT

Functional classification of genes represents one of the most basic problems in genome analysis and annotation. Our analysis of some of the popular methods for functional classification of genes shows that these methods are not always consistent with each other and may not be specific enough for high-resolution gene functional annotations. We have developed a method to integrate genomic neighborhood information of genes with their sequence similarity information for the functional classification of prokaryotic genes. The application of our method to 93 proteobacterial genomes has shown that (i) the genomic neighborhoods are much more conserved across prokaryotic genomes than expected by chance, and such conservation can be utilized to improve functional classification of genes; (ii) while our method is consistent with the existing popular schemes as much as they are among themselves, it does provide functional classification at higher resolution and hence allows functional assignments of (new) genes at a more specific level; and (iii) our method is fairly stable when being applied to different genomes.


Subject(s)
Algorithms , Classification/methods , Computational Biology/methods , Genes, Bacterial/physiology , Genomics/methods , Prokaryotic Cells/physiology , Cluster Analysis , Computer Simulation , Genes, Bacterial/genetics , Sensitivity and Specificity
13.
Nucleic Acids Res ; 35(7): 2125-40, 2007.
Article in English | MEDLINE | ID: mdl-17353185

ABSTRACT

Functional classification of genes represents a fundamental problem to many biological studies. Most of the existing classification schemes are based on the concepts of homology and orthology, which were originally introduced to study gene evolution but might not be the most appropriate for gene function prediction, particularly at high resolution level. We have recently developed a scheme for hierarchical classification of genes (HCGs) in prokaryotes. In the HCG scheme, the functional equivalence relationships among genes are first assessed through a careful application of both sequence similarity and genomic neighborhood information; and genes are then classified into a hierarchical structure of clusters, where genes in each cluster are functionally equivalent at some resolution level, and the level of resolution goes higher as the clusters become increasingly smaller traveling down the hierarchy. The HCG scheme is validated through comparisons with the taxonomy of the prokaryotic genomes, Clusters of Orthologous Groups (COGs) of genes and the Pfam system. We have applied the HCG scheme to 224 complete prokaryotic genomes, and constructed a HCG database consisting of a forest of 5339 multi-level and 15 770 single-level trees of gene clusters covering approximately 93% of the genes of these 224 genomes. The validation results indicate that the HCG scheme not only captures the key features of the existing classification schemes but also provides a much richer organization of genes which can be used for functional prediction of genes at higher resolution and to help reveal evolutionary trace of the genes.


Subject(s)
Computational Biology/methods , Genes, Bacterial , Genomics/methods , Bacteria/classification , Cluster Analysis , DNA-Binding Proteins/classification , DNA-Binding Proteins/genetics , Genome, Bacterial , Ribonucleotide Reductases/classification , Ribonucleotide Reductases/genetics
14.
Nucleic Acids Res ; 34(8): 2418-27, 2006.
Article in English | MEDLINE | ID: mdl-16682449

ABSTRACT

We present a study on computational identification of uber-operons in a prokaryotic genome, each of which represents a group of operons that are evolutionarily or functionally associated through operons in other (reference) genomes. Uber-operons represent a rich set of footprints of operon evolution, whose full utilization could lead to new and more powerful tools for elucidation of biological pathways and networks than what operons have provided, and a better understanding of prokaryotic genome structures and evolution. Our prediction algorithm predicts uber-operons through identifying groups of functionally or transcriptionally related operons, whose gene sets are conserved across the target and multiple reference genomes. Using this algorithm, we have predicted uber-operons for each of a group of 91 genomes, using the other 90 genomes as references. In particular, we predicted 158 uber-operons in Escherichia coli K12 covering 1830 genes, and found that many of the uber-operons correspond to parts of known regulons or biological pathways or are involved in highly related biological processes based on their Gene Ontology (GO) assignments. For some of the predicted uber-operons that are not parts of known regulons or pathways, our analyses indicate that their genes are highly likely to work together in the same biological processes, suggesting the possibility of new regulons and pathways. We believe that our uber-operon prediction provides a highly useful capability and a rich information source for elucidation of complex biological processes, such as pathways in microbes. All the prediction results are available at our Uber-Operon Database: http://csbl.bmb.uga.edu/uber, the first of its kind.


Subject(s)
Escherichia coli/genetics , Genome, Bacterial , Genomics/methods , Operon , Algorithms , Bacteria/metabolism , Bacterial Proteins/genetics , Citric Acid Cycle , Computational Biology , Evolution, Molecular , Membrane Proteins/genetics , Regulon , Reproducibility of Results , Sulfur/metabolism
15.
Nucleic Acids Res ; 34(3): 1050-65, 2006.
Article in English | MEDLINE | ID: mdl-16473855

ABSTRACT

Deciphering the regulatory networks encoded in the genome of an organism represents one of the most interesting and challenging tasks in the post-genome sequencing era. As an example of this problem, we have predicted a detailed model for the nitrogen assimilation network in cyanobacterium Synechococcus sp. WH 8102 (WH8102) using a computational protocol based on comparative genomics analysis and mining experimental data from related organisms that are relatively well studied. This computational model is in excellent agreement with the microarray gene expression data collected under ammonium-rich versus nitrate-rich growth conditions, suggesting that our computational protocol is capable of predicting biological pathways/networks with high accuracy. We then refined the computational model using the microarray data, and proposed a new model for the nitrogen assimilation network in WH8102. An intriguing discovery from this study is that nitrogen assimilation affects the expression of many genes involved in photosynthesis, suggesting a tight coordination between nitrogen assimilation and photosynthesis processes. Moreover, for some of these genes, this coordination is probably mediated by NtcA through the canonical NtcA promoters in their regulatory regions.


Subject(s)
Computational Biology/methods , Gene Expression Regulation, Bacterial , Genomics/methods , Models, Genetic , Nitrogen/metabolism , Synechococcus/genetics , Bacterial Proteins/classification , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , DNA-Binding Proteins/genetics , Gene Expression Profiling , Genome, Bacterial , Operon , Photosynthesis/genetics , Phylogeny , Promoter Regions, Genetic , Synechococcus/metabolism , Transcription Factors/genetics
16.
Proc Natl Acad Sci U S A ; 103(1): 129-34, 2006 Jan 03.
Article in English | MEDLINE | ID: mdl-16373500

ABSTRACT

Mapping biological pathways across microbial genomes is a highly important technique in functional studies of biological systems. Existing methods mainly rely on sequence-based orthologous gene mapping, which often leads to suboptimal mapping results because sequence-similarity information alone does not contain sufficient information for accurate identification of orthology relationship. Here we present an algorithm for pathway mapping across microbial genomes. The algorithm takes into account both sequence similarity and genomic structure information such as operons and regulons. One basic premise of our approach is that a microbial pathway could generally be decomposed into a few operons or regulons. We formulated the pathway-mapping problem to map genes across genomes to maximize their sequence similarity under the constraint that the mapped genes be grouped into a few operons, preferably coregulated in the target genome. We have developed an integer-programming algorithm for solving this constrained optimization problem and implemented the algorithm as a computer software program, p-map. We have tested p-map on a number of known homologous pathways. We conclude that using genomic structure information as constraints could greatly improve the pathway-mapping accuracy over methods that use sequence-similarity information alone.


Subject(s)
Algorithms , Chromosome Mapping/methods , Computational Biology/methods , Escherichia coli K12/genetics , Genome, Bacterial/genetics , Genomics/methods , Base Sequence , Genome Components , Sequence Homology , Species Specificity
17.
Nucleic Acids Res ; 33(16): 5156-71, 2005.
Article in English | MEDLINE | ID: mdl-16157864

ABSTRACT

We have developed a new method for prediction of cis-regulatory binding sites and applied it to predicting NtcA regulated genes in cyanobacteria. The algorithm rigorously utilizes concurrence information of multiple binding sites in the upstream region of a gene and that in the upstream regions of its orthologues in related genomes. A probabilistic model was developed for the evaluation of prediction reliability so that the prediction false positive rate could be well controlled. Using this method, we have predicted multiple new members of the NtcA regulons in nine sequenced cyanobacterial genomes, and showed that the false positive rates of the predictions have been reduced on an average of 40-fold compared to the conventional methods. A detailed analysis of the predictions in each genome showed that a significant portion of our predictions are consistent with previously published results about individual genes. Intriguingly, NtcA promoters are found for many genes involved in various stages of photosynthesis. Although photosynthesis is known to be tightly coordinated with nitrogen assimilation, very little is known about the underlying mechanism. We postulate for the fist time that these genes serve as the regulatory points to orchestrate these two important processes in a cyanobacterial cell.


Subject(s)
Bacterial Proteins/metabolism , Cyanobacteria/genetics , Genomics/methods , Nitrogen/metabolism , Photosynthesis/genetics , Promoter Regions, Genetic , Regulon , Transcription Factors/metabolism , Algorithms , Amino Acid Sequence , Bacterial Proteins/chemistry , Binding Sites , Consensus Sequence , Conserved Sequence , Cyanobacteria/metabolism , DNA Footprinting , DNA-Binding Proteins/chemistry , DNA-Binding Proteins/metabolism , Gene Expression Regulation, Bacterial , Molecular Sequence Data , Protein Structure, Tertiary , Transcription Factors/chemistry
18.
Nucleic Acids Res ; 33(9): 2822-37, 2005.
Article in English | MEDLINE | ID: mdl-15901854

ABSTRACT

We present a computational method for the prediction of functional modules encoded in microbial genomes. In this work, we have also developed a formal measure to quantify the degree of consistency between the predicted and the known modules, and have carried out statistical significance analysis of consistency measures. We first evaluate the functional relationship between two genes from three different perspectives--phylogenetic profile analysis, gene neighborhood analysis and Gene Ontology assignments. We then combine the three different sources of information in the framework of Bayesian inference, and we use the combined information to measure the strength of gene functional relationship. Finally, we apply a threshold-based method to predict functional modules. By applying this method to Escherichia coli K12, we have predicted 185 functional modules. Our predictions are highly consistent with the previously known functional modules in E.coli. The application results have demonstrated that our approach is highly promising for the prediction of functional modules encoded in a microbial genome.


Subject(s)
Computational Biology/methods , Genome, Bacterial , Genomics/methods , Bayes Theorem , Data Interpretation, Statistical , Escherichia coli/genetics , Phylogeny , Reproducibility of Results
19.
Nucleic Acids Res ; 33(2): 546-58, 2005.
Article in English | MEDLINE | ID: mdl-15673715

ABSTRACT

Computational evaluation of protein-DNA interaction is important for the identification of DNA-binding sites and genome annotation. It could validate the predicted binding motifs by sequence-based approaches through the calculation of the binding affinity between a protein and DNA. Such an evaluation should take into account structural information to deal with the complicated effects from DNA structural deformation, distance-dependent multi-body interactions and solvation contributions. In this paper, we present a knowledge-based potential built on interactions between protein residues and DNA tri-nucleotides. The potential, which explicitly considers the distance-dependent two-body, three-body and four-body interactions between protein residues and DNA nucleotides, has been optimized in terms of a Z-score. We have applied this knowledge-based potential to evaluate the binding affinities of zinc-finger protein-DNA complexes. The predicted binding affinities are in good agreement with the experimental data (with a correlation coefficient of 0.950). On a larger test set containing 48 protein-DNA complexes with known experimental binding free energies, our potential has achieved a high correlation coefficient of 0.800, when compared with the experimental data. We have also used this potential to identify binding motifs in DNA sequences of transcription factors (TF). The TFs in 79.4% of the known TF-DNA complexes have accurately found their native binding sequences from a large pool of DNA sequences. When tested in a genome-scale search for TF-binding motifs of the cyclic AMP regulatory protein (CRP) of Escherichia coli, this potential ranks all known binding motifs of CRP in the top 15% of all candidate sequences.


Subject(s)
Computational Biology/methods , DNA-Binding Proteins/metabolism , DNA/chemistry , DNA/metabolism , Models, Statistical , Base Sequence , Binding Sites , Cyclic AMP Receptor Protein , DNA-Binding Proteins/chemistry , Escherichia coli/genetics , Escherichia coli Proteins/metabolism , Genome, Bacterial , Genomics/methods , Receptors, Cell Surface/metabolism , Transcription Factors/chemistry , Transcription Factors/metabolism , Zinc Fingers
20.
Genome Inform ; 16(2): 247-59, 2005.
Article in English | MEDLINE | ID: mdl-16901107

ABSTRACT

We present a computational method for prediction of functional modules that can be directly applied to the newly sequenced microbial genomes for predicting gene functions and the component genes of biological pathways. We first quantify the functional relatedness among genes based on their distribution (i.e., their existences and orders) across multiple microbial genomes, and obtain a gene network in which every pair of genes is associated with a score representing their functional relatedness. We then apply a threshold-based clustering algorithm to this gene network, and obtain modules for each of which the number of genes is bounded from above by a pre-specified value and the component genes are more strongly functionally related to each other than genes across the predicted modules. Particularly, when the module size is bounded by 130, we obtain 167 functional modules covering 813 genes for Escherichia coli K12, and 138 functional modules covering 731 genes for Bacillus subtilis subsp. subtilis str. 168. We have used the gene ontology (GO) information to assess the prediction results. The GO similarities among the genes of the same functional module are compared with the GO similarities among the genes that are randomly clustered together. This comparison reveals that our predicted functional modules are statistically and biologically significant, and the genes of the same functional module share more commonality in terms of biological process than in terms of molecular function or cellular component. We have also examined the predicted functional modules that are common to both Escherichia coli K12 and Bacillus subtilis subsp. subtilis str. 168, and provide explanations for some functional modules.


Subject(s)
Computational Biology/methods , Genes, Bacterial , Genetics, Microbial/methods , Genome, Bacterial , Genomics/methods , Models, Genetic , Cluster Analysis , Genes, Bacterial/physiology , Genome, Bacterial/physiology , Predictive Value of Tests
SELECTION OF CITATIONS
SEARCH DETAIL
...