Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
BMC Bioinformatics ; 17(1): 406, 2016 Oct 06.
Article in English | MEDLINE | ID: mdl-27716039

ABSTRACT

BACKGROUND: Biological sequence motifs drive the specific interactions of proteins and nucleic acids. Accordingly, the effective computational discovery and analysis of such motifs is a central theme in bioinformatics. Many practical questions about the properties of motifs can be recast as random sampling problems. In this light, the task is to determine for a given motif whether a certain feature of interest is statistically unusual among relevantly similar alternatives. Despite the generality of this framework, its use has been frustrated by the difficulties of defining an appropriate reference class of motifs for comparison and of sampling from it effectively. RESULTS: We define two distributions over the space of all motifs of given dimension. The first is the maximum entropy distribution subject to mean information content, and the second is the truncated uniform distribution over all motifs having information content within a given interval. We derive exact sampling algorithms for each. As a proof of concept, we employ these sampling methods to analyze a broad collection of prokaryotic and eukaryotic transcription factor binding site motifs. In addition to positional information content, we consider the informational Gini coefficient of the motif, a measure of the degree to which information is evenly distributed throughout a motif's positions. We find that both prokaryotic and eukaryotic motifs tend to exhibit higher informational Gini coefficients (IGC) than would be expected by chance under either reference distribution. As a second application, we apply maximum entropy sampling to the motif p-value problem and use it to give elementary derivations of two new estimators. CONCLUSIONS: Despite the historical centrality of biological sequence motif analysis, this study constitutes to our knowledge the first use of principled null hypotheses for sequence motifs given information content. Through their use, we are able to characterize for the first time differerences in global motif statistics between biological motifs and their null distributions. In particular, we observe that biological sequence motifs show an unusual distribution of IGC, presumably due to biochemical constraints on the mechanisms of direct read-out.


Subject(s)
Algorithms , Computational Biology/methods , Nucleotide Motifs/genetics , Proteins/metabolism , Transcription Factors/genetics , Binding Sites , Databases, Genetic , Eukaryotic Cells/metabolism , Humans , Prokaryotic Cells/metabolism , Proteins/chemistry , Sequence Analysis, DNA/methods , Transcription Factors/metabolism
2.
Algorithms Mol Biol ; 11: 19, 2016.
Article in English | MEDLINE | ID: mdl-27398089

ABSTRACT

BACKGROUND: Metagenomics enables the analysis of bacterial population composition and the study of emergent population features, such as shared metabolic pathways. Recently, we have shown that metagenomics datasets can be leveraged to characterize population-wide transcriptional regulatory networks, or meta-regulons, providing insights into how bacterial populations respond collectively to specific triggers. Here we formalize a Bayesian inference framework to analyze the composition of transcriptional regulatory networks in metagenomes by determining the probability of regulation of orthologous gene sequences. We assess the performance of this approach on synthetic datasets and we validate it by analyzing the copper-homeostasis network of Firmicutes species in the human gut microbiome. RESULTS: Assessment on synthetic datasets shows that our method provides a robust and interpretable metric for assessing putative regulation by a transcription factor on sets of promoter sequences mapping to an orthologous gene cluster. The inference framework integrates the regulatory contribution of secondary sites and can discern false positives arising from multiple instances of a clonal sequence. Posterior probabilities for orthologous gene clusters decline sharply when less than 20 % of mapped promoters have binding sites, but we introduce a sensitivity adjustment procedure to speed up computation that enhances regulation assessment in heterogeneous ortholog clusters. Analysis of the copper-homeostasis regulon governed by CsoR in the human gut microbiome Firmicutes reveals that CsoR controls itself and copper-translocating P-type ATPases, but not CopZ-type copper chaperones. Our analysis also indicates that CsoR frequently targets promoters with dual CsoR-binding sites, suggesting that it exploits higher-order binding conformations to fine-tune its activity. CONCLUSIONS: We introduce and validate a method for the analysis of transcriptional regulatory networks from metagenomic data that enables inference of meta-regulons in a systematic and interpretable way. Validation of this method on the CsoR meta-regulon of gut microbiome Firmicutes illustrates the usefulness of the approach, revealing novel properties of the copper-homeostasis network in poorly characterized bacterial species and putting forward evidence of new mechanisms of DNA binding for this transcriptional regulator. Our approach will enable the comparative analysis of regulatory networks across metagenomes, yielding novel insights into the evolution of transcriptional regulatory networks.

3.
PLoS Comput Biol ; 12(3): e1004796, 2016 Mar.
Article in English | MEDLINE | ID: mdl-26953935

ABSTRACT

Activation of CD4+ T cells requires the recognition of peptides that are presented by HLA class II molecules and can be assessed experimentally using the ELISpot assay. However, even given an individual's HLA class II genotype, identifying which class II molecule is responsible for a positive ELISpot response to a given peptide is not trivial. The two main difficulties are the number of HLA class II molecules that can potentially be formed in a single individual (3-14) and the lack of clear peptide binding motifs for class II molecules. Here, we present a Bayesian framework to interpret ELISpot data (BIITE: Bayesian Immunogenicity Inference Tool for ELISpot); specifically BIITE identifies which HLA-II:peptide combination(s) are immunogenic based on cohort ELISpot data. We apply BIITE to two ELISpot datasets and explore the expected performance using simulations. We show this method can reach high accuracies, depending on the cohort size and the success rate of the ELISpot assay within the cohort.


Subject(s)
Computational Biology/methods , Enzyme-Linked Immunospot Assay/methods , Epitopes, T-Lymphocyte/chemistry , Epitopes, T-Lymphocyte/immunology , Histocompatibility Antigens Class II/chemistry , Histocompatibility Antigens Class II/immunology , Models, Immunological , Software , Algorithms , Burkholderia pseudomallei/immunology , Computer Simulation , Databases, Factual , Humans , Melioidosis/immunology , Peptides/analysis , Peptides/chemistry , Peptides/immunology
4.
Cancer Genet ; 208(9): 441-7, 2015 Sep.
Article in English | MEDLINE | ID: mdl-26227479

ABSTRACT

The information-theoretic concept of Shannon entropy can be used to quantify the information provided by a diagnostic test. We hypothesized that in tumor types with stereotyped mutational profiles, the results of NGS testing would yield lower average information than in tumors with more diverse mutations. To test this hypothesis, we estimated the entropy of NGS testing in various cancer types, using results obtained from clinical sequencing. A set of 238 tumors were subjected to clinical targeted NGS across all exons of 27 genes. There were 120 actionable variants in 109 cases, occurring in the genes KRAS, EGFR, PTEN, PIK3CA, KIT, BRAF, NRAS, IDH1, and JAK2. Sequencing results for each tumor were modeled as a dichotomized genotype (actionable mutation detected or not detected) for each of the 27 genes. Based upon the entropy of these genotypes, sequencing was most informative for colorectal cancer (3.235 bits of information/case) followed by high grade glioma (2.938 bits), lung cancer (2.197 bits), pancreatic cancer (1.339 bits), and sarcoma/STTs (1.289 bits). In the most informative cancer types, the information content of NGS was similar to surgical pathology examination (modeled at approximately 2-3 bits). Entropy provides a novel measure of utility for laboratory testing in general and for NGS in particular. This metric is, however, purely analytical and does not capture the relative clinical significance of the identified variants, which may also differ across tumor types.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing/methods , Neoplasms/diagnosis , Sequence Analysis, DNA/methods , DNA Mutational Analysis/methods , Entropy , Humans , Models, Genetic , Neoplasms/genetics , Retrospective Studies
5.
J Comput Biol ; 21(5): 373-84, 2014 May.
Article in English | MEDLINE | ID: mdl-24689750

ABSTRACT

Transcription factors (TFs) regulate transcription by binding to specific sites in promoter regions. Information theory provides a useful mathematical framework to analyze the binding motifs associated with TFs but imposes several assumptions that limit their applicability to specific regulatory scenarios. Explicit simulations of the co-evolution of TFs and their binding motifs allow the study of the evolution of regulatory networks with a high degree of realism. In this work we analyze the impact of differential regulatory demands on the information content of TF-binding motifs by means of evolutionary simulations. We generalize a predictive index based on information theory, and we validate its applicability to regulatory scenarios in which the TF binds significantly to the genomic background. Our results show a logarithmic dependence of the evolved information content on the occupancy of target sites and indicate that TFs may actively exploit pseudo-sites to modulate their occupancy of target sites. In regulatory networks with differentially regulated targets, we observe that information content in TF-binding motifs is dictated primarily by the fraction of total probability mass that the TF assigns to its target sites, and we provide a predictive index to estimate the amount of information associated with arbitrarily complex regulatory systems. We observe that complex regulatory patterns can exert additional demands on evolved information content, but, given a total occupancy for target sites, we do not find conclusive evidence that this effect is because of the range of required binding affinities.


Subject(s)
Gene Expression Regulation/genetics , Transcription, Genetic/genetics , Binding Sites/genetics , Computational Biology/methods , Evolution, Molecular , Genomics/methods , Promoter Regions, Genetic/genetics , Protein Binding/genetics , Transcription Factors/genetics
6.
Bioinformatics ; 30(9): 1193-7, 2014 May 01.
Article in English | MEDLINE | ID: mdl-24407225

ABSTRACT

MOTIVATION: Data from metagenomics projects remain largely untapped for the analysis of transcriptional regulatory networks. Here, we provide proof-of-concept that metagenomic data can be effectively leveraged to analyze regulatory networks by characterizing the SOS meta-regulon in the human gut microbiome. RESULTS: We combine well-established in silico and in vitro techniques to mine the human gut microbiome data and determine the relative composition of the SOS network in a natural setting. Our analysis highlights the importance of translesion synthesis as a primary function of the SOS response. We predict the association of this network with three novel protein clusters involved in cell wall biogenesis, chromosome partitioning and restriction modification, and we confirm binding of the SOS response transcriptional repressor to sites in the promoter of a cell wall biogenesis enzyme, a phage integrase and a death-on-curing protein. We discuss the implications of these findings and the potential for this approach for metagenome analysis.


Subject(s)
Gastrointestinal Tract/microbiology , Metagenomics/methods , Microbiota , Regulon , SOS Response, Genetics , Bacterial Proteins/genetics , Bacterial Proteins/metabolism , Bacteriophages/genetics , Base Sequence , Binding Sites , Humans , Metagenome , Serine Endopeptidases/genetics , Serine Endopeptidases/metabolism
7.
PLoS One ; 8(10): e76177, 2013.
Article in English | MEDLINE | ID: mdl-24116094

ABSTRACT

Codon usage bias (CUB) results from the complex interplay between translational selection and mutational biases. Current methods for CUB analysis apply heuristics to integrate both components, limiting the depth and scope of CUB analysis as a technique to probe into the evolution and optimization of protein-coding genes. Here we introduce a self-consistent CUB index (scnRCA) that incorporates implicit correction for mutational biases, facilitating exploration of the translational selection component of CUB. We validate this technique using gene expression data and we apply it to a detailed analysis of CUB in the Pseudomonadales. Our results illustrate how the selective enrichment of specific codons among highly expressed genes is preserved in the context of genome-wide shifts in codon frequencies, and how the balance between mutational and translational biases leads to varying definitions of codon optimality. We extend this analysis to other moderate and fast growing bacteria and we provide unified support for the hypothesis that C- and A-ending codons of two-box amino acids, and the U-ending codons of four-box amino acids, are systematically enriched among highly expressed genes across bacteria. The use of an unbiased estimator of CUB allows us to report for the first time that the signature of translational selection is strongly conserved in the Pseudomonadales in spite of drastic changes in genome composition, and extends well beyond the core set of highly optimized genes in each genome. We generalize these results to other moderate and fast growing bacteria, hinting at selection for a universal pattern of gene expression that is conserved and detectable in conserved patterns of codon usage bias.


Subject(s)
Codon , Mutation , Selection, Genetic , Databases, Genetic , Genome , Moraxellaceae/genetics , Pseudomonas/genetics
SELECTION OF CITATIONS
SEARCH DETAIL
...