Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 19 de 19
Filter
Add more filters










Publication year range
1.
Nat Commun ; 8(1): 1346, 2017 11 07.
Article in English | MEDLINE | ID: mdl-29116202

ABSTRACT

Acetylation of the histone variant H2A.Z (H2A.Zac) occurs at active promoters and is associated with oncogene activation in prostate cancer, but its role in enhancer function is still poorly understood. Here we show that H2A.Zac containing nucleosomes are commonly redistributed to neo-enhancers in cancer resulting in a concomitant gain of chromatin accessibility and ectopic gene expression. Notably incorporation of acetylated H2A.Z nucleosomes is a pre-requisite for activation of Androgen receptor (AR) associated enhancers. H2A.Zac nucleosome occupancy is rapidly remodeled to flank the AR sites to initiate the formation of nucleosome-free regions and the production of AR-enhancer RNAs upon androgen treatment. Remarkably higher levels of global H2A.Zac correlate with poorer prognosis. Altogether these data demonstrate the novel contribution of H2A.Zac in activation of newly formed enhancers in prostate cancer.


Subject(s)
Enhancer Elements, Genetic/genetics , Histones/metabolism , Prostatic Neoplasms/genetics , Acetylation , Chromatin/genetics , Chromatin/metabolism , Disease-Free Survival , Epigenesis, Genetic , Gene Expression Regulation, Neoplastic , Histones/genetics , Humans , Male , Nucleosomes/genetics , Nucleosomes/metabolism , Prostatic Neoplasms/metabolism , Prostatic Neoplasms/mortality , Receptors, Androgen/genetics , Receptors, Androgen/metabolism
2.
Sci Rep ; 7(1): 7163, 2017 08 02.
Article in English | MEDLINE | ID: mdl-28769061

ABSTRACT

The long non-coding RNA PARTICLE (Gene PARTICL- 'Promoter of MAT2A-Antisense RadiaTion Induced Circulating LncRNA) partakes in triple helix (triplex) formation, is transiently elevated following low dose irradiation and regulates transcription of its neighbouring gene - Methionine adenosyltransferase 2A. It now emerges that PARTICLE triplex sites are predicted in many different genes across all human chromosomes. In silico analysis identified additional regions for PARTICLE triplexes at >1600 genomic locations. Multiple PARTICLE triplexes are clustered predominantly within the human and mouse tumor suppressor WW Domain Containing Oxidoreductase (WWOX) gene. Surface plasmon resonance diffraction and electrophoretic mobility shift assays were consistent with PARTICLE triplex formation within human WWOX with high resolution imaging demonstrating its enrichment at this locus on chromosome 16. PARTICLE knockdown and over-expression resulted in inverse changes in WWOX transcripts levels with siRNA interference eliminating PARTICLEs elevated transcription to irradiation. The evidence for a second functional site of PARTICLE triplex formation at WWOX suggests that PARTICLE may form triplex-mediated interactions at multiple positions in the human genome including remote loci. These findings provide a mechanistic explanation for the ability of lncRNAs to regulate the expression of numerous genes distributed across the genome.


Subject(s)
Genome, Human , RNA, Long Noncoding/chemistry , RNA, Long Noncoding/genetics , Tumor Suppressor Proteins/genetics , WW Domain-Containing Oxidoreductase/genetics , Animals , Binding Sites , Cell Line, Tumor , Cell Survival , Chromosomes, Human, Pair 16 , Disease Susceptibility , Epistasis, Genetic , Gene Expression Regulation , Genetic Loci , Genome , Humans , MAP Kinase Signaling System , Mice , Nucleic Acid Conformation , Promoter Regions, Genetic , Protein Binding , RNA Interference , RNA, Small Interfering/genetics , Transcription, Genetic
3.
Genome Res ; 26(6): 719-31, 2016 06.
Article in English | MEDLINE | ID: mdl-27053337

ABSTRACT

A three-dimensional chromatin state underpins the structural and functional basis of the genome by bringing regulatory elements and genes into close spatial proximity to ensure proper, cell-type-specific gene expression profiles. Here, we performed Hi-C chromosome conformation capture sequencing to investigate how three-dimensional chromatin organization is disrupted in the context of copy-number variation, long-range epigenetic remodeling, and atypical gene expression programs in prostate cancer. We find that cancer cells retain the ability to segment their genomes into megabase-sized topologically associated domains (TADs); however, these domains are generally smaller due to establishment of additional domain boundaries. Interestingly, a large proportion of the new cancer-specific domain boundaries occur at regions that display copy-number variation. Notably, a common deletion on 17p13.1 in prostate cancer spanning the TP53 tumor suppressor locus results in bifurcation of a single TAD into two distinct smaller TADs. Change in domain structure is also accompanied by novel cancer-specific chromatin interactions within the TADs that are enriched at regulatory elements such as enhancers, promoters, and insulators, and associated with alterations in gene expression. We also show that differential chromatin interactions across regulatory regions occur within long-range epigenetically activated or silenced regions of concordant gene activation or repression in prostate cancer. Finally, we present a novel visualization tool that enables integrated exploration of Hi-C interaction data, the transcriptome, and epigenome. This study provides new insights into the relationship between long-range epigenetic and genomic dysregulation and changes in higher-order chromatin interactions in cancer.


Subject(s)
Chromatin/genetics , Epigenesis, Genetic , Neoplasms/genetics , CCCTC-Binding Factor , Cell Line, Tumor , Enhancer Elements, Genetic , Gene Expression Regulation, Neoplastic , Genome, Human , Histones/metabolism , Humans , Molecular Sequence Annotation , Neoplasms/metabolism , Protein Binding , Protein Processing, Post-Translational , Repressor Proteins/physiology
4.
BMC Genomics ; 16: 1052, 2015 Dec 10.
Article in English | MEDLINE | ID: mdl-26651996

ABSTRACT

BACKGROUND: Genomic information is increasingly used in medical practice giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. The widely used Hadoop MapReduce architecture and associated machine learning library, Mahout, provide the means for tackling computationally challenging tasks. However, many genomic analyses do not fit the Map-Reduce paradigm. We therefore utilise the recently developed SPARK engine, along with its associated machine learning library, MLlib, which offers more flexibility in the parallelisation of population-scale bioinformatics tasks. The resulting tool, VARIANTSPARK provides an interface from MLlib to the standard variant format (VCF), offers seamless genome-wide sampling of variants and provides a pipeline for visualising results. RESULTS: To demonstrate the capabilities of VARIANTSPARK, we clustered more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VARIANTSPARK is 80 % faster than the SPARK-based genome clustering approach, ADAM, the comparable implementation using Hadoop/Mahout, as well as ADMIXTURE, a commonly used tool for determining individual ancestries. It is over 90 % faster than traditional implementations using R and Python. CONCLUSION: The benefits of speed, resource consumption and scalability enables VARIANTSPARK to open up the usage of advanced, efficient machine learning algorithms to genomic data.


Subject(s)
Computational Biology/methods , Genotype , Algorithms , Cluster Analysis , Humans , Polymorphism, Single Nucleotide , Software
5.
Kidney Int ; 88(2): 321-31, 2015 Aug.
Article in English | MEDLINE | ID: mdl-25993318

ABSTRACT

The Wilms' tumor suppressor WT1 is a key regulator of podocyte function that is mutated in Denys-Drash and Frasier syndromes. Here we have used an integrative approach employing ChIP, exon array, and genetic analyses in mice to address general and isoform-specific functions of WT1 in podocyte differentiation. Analysis of ChIP-Seq data showed that almost half of the podocyte-specific genes are direct targets of WT1. Bioinformatic analysis further identified coactivator FOXC1-binding sites in proximity to WT1-bound regions, thus supporting coordinated action of these transcription factors in regulating podocyte-specific genes. Transcriptional profiling of mice lacking the WT1 alternative splice isoform (+KTS) had a more restrictive set of genes whose expression depends on these alternatively spliced isoforms. One of these genes encodes the membrane-associated guanylate kinase MAGI2, a protein that localizes to the base of the slit diaphragm. Using functional analysis in mice, we further show that MAGI2α is essential for proper localization of nephrin and the assembly of the slit diaphragm complex. Finally, a dramatic reduction of MAGI2 was found in an LPS mouse model of glomerular injury and in genetic cases of human disease. Thus, our study highlights the central role of WT1 in podocyte differentiation, identifies that WT1 has a central role in podocyte differentiation, and identifies MAGI2α as the crucial isoform in slit diaphragm assembly, suggesting a causative role of this gene in the etiology of glomerular disorders.


Subject(s)
Adaptor Proteins, Signal Transducing/genetics , Adaptor Proteins, Signal Transducing/metabolism , Cell Differentiation/genetics , Guanylate Kinases/genetics , Guanylate Kinases/metabolism , Podocytes/physiology , Repressor Proteins/genetics , Transcription, Genetic , Alternative Splicing , Animals , Binding Sites , Down-Regulation/drug effects , Exons , Female , Forkhead Transcription Factors/genetics , Glomerulonephritis, Membranoproliferative/metabolism , Glomerulosclerosis, Focal Segmental/metabolism , Humans , Lipopolysaccharides/pharmacology , Membrane Proteins/metabolism , Mice , Mutation , Oligonucleotide Array Sequence Analysis , Podocytes/pathology , Promoter Regions, Genetic , Protein Isoforms/genetics , Repressor Proteins/metabolism , WT1 Proteins
6.
Cell Rep ; 11(3): 474-85, 2015 Apr 21.
Article in English | MEDLINE | ID: mdl-25900080

ABSTRACT

Exposure to low-dose irradiation causes transiently elevated expression of the long ncRNA PARTICLE (gene PARTICLE, promoter of MAT2A-antisense radiation-induced circulating lncRNA). PARTICLE affords both a cytosolic scaffold for the tumor suppressor methionine adenosyltransferase (MAT2A) and a nuclear genetic platform for transcriptional repression. In situ hybridization discloses that PARTICLE and MAT2A associate together following irradiation. Bromouridine tracing and presence in exosomes indicate intercellular transport, and this is supported by ex vivo data from radiotherapy-treated patients. Surface plasmon resonance indicates that PARTICLE forms a DNA-lncRNA triplex upstream of a MAT2A promoter CpG island. We show that PARTICLE represses MAT2A via methylation and demonstrate that the radiation-induced PARTICLE interacts with the transcription-repressive complex proteins G9a and SUZ12 (subunit of PRC2). The interplay of PARTICLE with MAT2A implicates this lncRNA in intercellular communication and as a recruitment platform for gene-silencing machineries through triplex formation in response to irradiation.


Subject(s)
DNA Methylation/radiation effects , Gene Expression Regulation/radiation effects , RNA, Long Noncoding/biosynthesis , RNA, Long Noncoding/genetics , Carcinoma, Squamous Cell/radiotherapy , Cell Line , Chromatin Immunoprecipitation , DNA Methylation/genetics , Electrophoretic Mobility Shift Assay , Head and Neck Neoplasms/radiotherapy , Humans , Immunoblotting , In Situ Hybridization , Methionine Adenosyltransferase/biosynthesis , Methionine Adenosyltransferase/genetics , Oligonucleotide Array Sequence Analysis , Radiation, Ionizing , Squamous Cell Carcinoma of Head and Neck , Surface Plasmon Resonance
8.
Nat Commun ; 5: 4444, 2014 Jul 17.
Article in English | MEDLINE | ID: mdl-25031030

ABSTRACT

Kidney organogenesis requires the tight control of proliferation, differentiation and apoptosis of renal progenitor cells. How the balance between these cellular decisions is achieved remains elusive. The Wilms' tumour suppressor Wt1 is required for progenitor survival, but the molecular cause for renal agenesis in mutants is poorly understood. Here we demonstrate that lack of Wt1 abolishes fibroblast growth factor (FGF) and induces BMP/pSMAD signalling within the metanephric mesenchyme. Addition of recombinant FGFs or inhibition of pSMAD signalling rescues progenitor cell apoptosis induced by the loss of Wt1. We further show that recombinant BMP4, but not BMP7, induces an apoptotic response within the early kidney that can be suppressed by simultaneous addition of FGFs. These data reveal a hitherto unknown sensitivity of early renal progenitors to pSMAD signalling, establishes FGF and pSMAD signalling as antagonistic forces in early kidney development and places WT1 as a key regulator of pro-survival FGF signalling pathway genes.


Subject(s)
Fibroblast Growth Factors/metabolism , Repressor Proteins/metabolism , Animals , Bone Morphogenetic Proteins/genetics , Bone Morphogenetic Proteins/metabolism , Cell Differentiation/genetics , Cell Differentiation/physiology , Computational Biology , Fibroblast Growth Factors/genetics , Fluorescent Antibody Technique , Gene Expression Regulation, Developmental/genetics , Gene Expression Regulation, Developmental/physiology , In Situ Hybridization , In Situ Nick-End Labeling , Mice , Mice, Mutant Strains , Organ Culture Techniques , Repressor Proteins/genetics , Reverse Transcriptase Polymerase Chain Reaction , Signal Transduction/genetics , Signal Transduction/physiology , Stem Cells/metabolism , WT1 Proteins
9.
Trends Mol Med ; 20(9): 479-86, 2014 Sep.
Article in English | MEDLINE | ID: mdl-24801560

ABSTRACT

Genome sequencing has the potential for stratified cancer treatment and improved diagnostics for rare disorders. However, sequencing needs to be utilised in risk stratification on a population scale to deepen the impact on the health system by addressing common diseases, where individual genomic variants have variable penetrance and minor impact. As the accuracy of genomic risk predictors is bounded by heritability, environmental factors such as diet, lifestyle, and microbiome have to be considered. Large-scale, longitudinal research programmes need to study the intrinsic properties between both genetics and environment to unravel their risk contribution. During this discovery process, frameworks need to be established to counteract unrealistic expectations. Sufficient scientific evidence is needed to interpret sources of uncertainty and inform decision making for clinical management and personal health.


Subject(s)
Delivery of Health Care/methods , Genomics/methods , Precision Medicine , Electronic Health Records , Genomics/trends , Humans , Nutrition Assessment , Risk Factors
10.
Bioinformatics ; 30(10): 1471-2, 2014 May 15.
Article in English | MEDLINE | ID: mdl-24470576

ABSTRACT

SUMMARY: The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. AVAILABILITY AND IMPLEMENTATION: Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. CONTACT: Denis.Bauer@csiro.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
High-Throughput Nucleotide Sequencing/methods , Automation, Laboratory , Humans , Software
11.
Bioinformatics ; 29(15): 1895-7, 2013 Aug 01.
Article in English | MEDLINE | ID: mdl-23740745

ABSTRACT

SUMMARY: At the heart of many modern biotechnological and therapeutic applications lies the need to target specific genomic loci with pinpoint accuracy. Although landmark experiments demonstrate technological maturity in manufacturing and delivering genetic material, the genomic sequence analysis to find suitable targets lags behind. We provide a computational aid for the sophisticated design of sequence-specific ligands and selection of appropriate targets, taking gene location and genomic architecture into account. AVAILABILITY: Source code and binaries are downloadable from www.bioinformatics.org.au/triplexator/inspector. CONTACT: t.bailey@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
DNA/chemistry , Gene Targeting , Software , Genetic Loci , Genomics , Humans , Peptide Nucleic Acids/chemistry
12.
Genome Res ; 22(7): 1372-81, 2012 Jul.
Article in English | MEDLINE | ID: mdl-22550012

ABSTRACT

Double-stranded DNA is able to form triple-helical structures by accommodating a third nucleotide strand in its major groove. This sequence-specific process offers a potent mechanism for targeting genomic loci of interest that is of great value for biotechnological and gene-therapeutic applications. It is likely that nature has leveraged this addressing system for gene regulation, because computational studies have uncovered an abundance of putative triplex target sites in various genomes, with enrichment particularly in gene promoters. However, to draw a more complete picture of the in vivo role of triplexes, not only the putative targets but also the sequences acting as the third strand and their capability to pair with the predicted target sites need to be studied. Here we present Triplexator, the first computational framework that integrates all aspects of triplex formation, and showcase its potential by discussing research examples for which the different aspects of triplex formation are important. We find that chromatin-associated RNAs have a significantly higher fraction of sequence features able to form triplexes than expected at random, suggesting their involvement in gene regulation. We furthermore identify hundreds of human genes that contain sequence features in their promoter predicted to be able to form a triplex with a target within the same promoter, suggesting the involvement of triplexes in feedback-based gene regulation. With focus on biotechnological applications, we screen mammalian genomes for high-affinity triplex target sites that can be used to target genomic loci specifically and find that triplex formation offers a resolution of ~1300 nt.


Subject(s)
Algorithms , Gene Expression Profiling/methods , Genomics/methods , Oligonucleotides/chemistry , RNA-Binding Proteins/chemistry , Animals , Chromatin/chemistry , Chromatin/genetics , Circular Dichroism , Computational Biology/methods , DNA/chemistry , DNA/genetics , Genetic Loci , Genome, Human , Humans , Hydrogen Bonding , Nucleic Acid Conformation , Oligonucleotides/genetics , Promoter Regions, Genetic , RNA Stability , RNA-Binding Proteins/genetics , Time Factors
13.
Bioinformatics ; 28(1): 56-62, 2012 Jan 01.
Article in English | MEDLINE | ID: mdl-22072382

ABSTRACT

MOTIVATION: Accurate knowledge of the genome-wide binding of transcription factors in a particular cell type or under a particular condition is necessary for understanding transcriptional regulation. Using epigenetic data such as histone modification and DNase I, accessibility data has been shown to improve motif-based in silico methods for predicting such binding, but this approach has not yet been fully explored. RESULTS: We describe a probabilistic method for combining one or more tracks of epigenetic data with a standard DNA sequence motif model to improve our ability to identify active transcription factor binding sites (TFBSs). We convert each data type into a position-specific probabilistic prior and combine these priors with a traditional probabilistic motif model to compute a log-posterior odds score. Our experiments, using histone modifications H3K4me1, H3K4me3, H3K9ac and H3K27ac, as well as DNase I sensitivity, show conclusively that the log-posterior odds score consistently outperforms a simple binary filter based on the same data. We also show that our approach performs competitively with a more complex method, CENTIPEDE, and suggest that the relative simplicity of the log-posterior odds scoring method makes it an appealing and very general method for identifying functional TFBSs on the basis of DNA and epigenetic evidence. AVAILABILITY AND IMPLEMENTATION: FIMO, part of the MEME Suite software toolkit, now supports log-posterior odds scoring using position-specific priors for motif search. A web server and source code are available at http://meme.nbcr.net. Utilities for creating priors are at http://research.imb.uq.edu.au/t.bailey/SD/Cuellar2011. CONTACT: t.bailey@uq.edu.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Epigenomics , Histone Code , Models, Statistical , Software , Transcription Factors/metabolism , Animals , DNA/chemistry , DNA/metabolism , Gene Expression Regulation , Humans , Nucleotide Motifs , Sequence Analysis, DNA , Transcription Factors/chemistry
14.
Bioinformatics ; 27(13): i7-14, 2011 Jul 01.
Article in English | MEDLINE | ID: mdl-21685104

ABSTRACT

MOTIVATION: Quantitative experimental analyses of the nuclear interior reveal a morphologically structured yet dynamic mix of membraneless compartments. Major nuclear events depend on the functional integrity and timely assembly of these intra-nuclear compartments. Yet, unknown drivers of protein mobility ensure that they are in the right place at the time when they are needed. RESULTS: This study investigates determinants of associations between eight intra-nuclear compartments and their proteins in heterogeneous genome-wide data. We develop a model based on a range of candidate determinants, capable of mapping the intra-nuclear organization of proteins. The model integrates protein interactions, protein domains, post-translational modification sites and protein sequence data. The predictions of our model are accurate with a mean AUC (over all compartments) of 0.71. We present a complete map of the association of 3567 mouse nuclear proteins with intra-nuclear compartments. Each decision is explained in terms of essential interactions and domains, and qualified with a false discovery assessment. Using this resource, we uncover the collective role of transcription factors in each of the compartments. We create diagrams illustrating the outcomes of a Gene Ontology enrichment analysis. Associated with an extensive range of transcription factors, the analysis suggests that PML bodies coordinate regulatory immune responses.


Subject(s)
Bayes Theorem , Cell Nucleus/chemistry , Proteome/analysis , Transcription Factors/analysis , Animals , Cell Nucleus/metabolism , Gene Expression , Intranuclear Inclusion Bodies/chemistry , Intranuclear Inclusion Bodies/metabolism , Mice , Nuclear Proteins/metabolism , Protein Processing, Post-Translational , Protein Transport , Transcription Factors/metabolism
15.
RNA Biol ; 8(3): 427-39, 2011.
Article in English | MEDLINE | ID: mdl-21525785

ABSTRACT

The ability of double-stranded DNA to form a triple-helical structure by hydrogen bonding with a third strand is well established, but the biological functions of these structures remain largely unknown. There is considerable albeit circumstantial evidence for the existence of nucleic triplexes in vivo and their potential participation in a variety of biological processes including chromatin organization, DNA repair, transcriptional regulation, and RNA processing has been investigated in a number of studies to date. There is also a range of possible mechanisms to regulate triplex formation through differential expression of triplex-forming RNAs, alteration of chromatin accessibility, sequence unwinding and nucleotide modifications. With the advent of next generation sequencing technology combined with targeted approaches to isolate triplexes, it is now possible to survey triplex formation with respect to their genomic context, abundance and dynamical changes during differentiation and development, which may open up new vistas in understanding genome biology and gene regulation.


Subject(s)
Nucleic Acids/chemistry , Nucleic Acids/metabolism , Animals , Base Sequence , DNA/chemistry , DNA/metabolism , Humans , Hydrogen Bonding , Models, Biological , Molecular Sequence Data , Nucleic Acid Conformation , RNA/chemistry , RNA/metabolism
16.
BMC Bioinformatics ; 11: 366, 2010 Jul 02.
Article in English | MEDLINE | ID: mdl-20594356

ABSTRACT

BACKGROUND: Quantitative models for transcriptional regulation have shown great promise for advancing our understanding of the biological mechanisms underlying gene regulation. However, all of the models to date assume a transcription factor (TF) to have either activating or repressing function towards all the genes it is regulating. RESULTS: In this paper we demonstrate, on the example of the developmental gene network in D. melanogaster, that the data-fit can be improved by up to 40% if the model is allowing certain TFs to have dual function, that is, acting as activator for some genes and as repressor for others. We demonstrate that the improvement is not due to additional flexibility in the model but rather derived from the data itself. We also found no evidence for the involvement of other known site-specific TFs in regulating this network. Finally, we propose SUMOylation as a candidate biological mechanism allowing TFs to switch their role when a small ubiquitin-like modifier (SUMO) is covalently attached to the TF. We strengthen this hypothesis by demonstrating that the TFs predicted to have dual function also contain the known SUMO consensus motif, while TFs predicted to have only one role lack this motif. CONCLUSIONS: We argue that a SUMOylation-dependent mechanism allowing TFs to have dual function represents a promising area for further research and might be another step towards uncovering the biological mechanisms underlying transcriptional regulation.


Subject(s)
Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Gene Regulatory Networks , Models, Genetic , Repressor Proteins/metabolism , Transcription Factors/metabolism , Amino Acid Motifs/genetics , Animals , Drosophila melanogaster/embryology , Evolution, Molecular , Gene Expression Regulation, Developmental , Genes, Developmental , Small Ubiquitin-Related Modifier Proteins/metabolism
17.
Bioinformatics ; 26(7): 860-6, 2010 Apr 01.
Article in English | MEDLINE | ID: mdl-20147307

ABSTRACT

MOTIVATION: Transcription factors (TFs) are crucial during the lifetime of the cell. Their functional roles are defined by the genes they regulate. Uncovering these roles not only sheds light on the TF at hand but puts it into the context of the complete regulatory network. RESULTS: Here, we present an alignment- and threshold-free comparative genomics approach for assigning functional roles to DNA regulatory motifs. We incorporate our approach into the Gomo algorithm, a computational tool for detecting associations between a user-specified DNA regulatory motif [expressed as a position weight matrix (PWM)] and Gene Ontology (GO) terms. Incorporating multiple species into the analysis significantly improves Gomo's ability to identify GO terms associated with the regulatory targets of TFs. Including three comparative species in the process of predicting TF roles in Saccharomyces cerevisiae and Homo sapiens increases the number of significant predictions by 75 and 200%, respectively. The predicted GO terms are also more specific, yielding deeper biological insight into the role of the TF. Adjusting motif (binding) affinity scores for individual sequence composition proves to be essential for avoiding false positive associations. We describe a novel DNA sequence-scoring algorithm that compensates a thermodynamic measure of DNA-binding affinity for individual sequence base composition. GOMO's prediction accuracy proves to be relatively insensitive to how promoters are defined. Because GOMO uses a threshold-free form of gene set analysis, there are no free parameters to tune. Biologists can investigate the potential roles of DNA regulatory motifs of interest using GOMO via the web (http://meme.nbcr.net).


Subject(s)
DNA/chemistry , Genomics/methods , Algorithms , Base Sequence , Binding Sites , Sequence Analysis, DNA , Transcription Factors/chemistry , Transcription Factors/genetics
18.
Proteins ; 77(1): 111-20, 2009 Oct.
Article in English | MEDLINE | ID: mdl-19415757

ABSTRACT

The exploration of novel proteins via recombination of fragments derived from structurally homologous proteins has enormous potential for medicine and biotechnology. This modular exchange of sequence material puts novel activities, substrate specificities, and stability within reach of a semi-random search. This article takes stock of the growing resource of experimentally characterized chimeric proteins within a homologous protein family to build sequence-function models that can effectively guide the construction of new libraries. A novel framework for predicting structural viability of chimeric proteins, only assuming knowledge of their sequence and their parental structure, is presented. Removing a major barrier in previous work, the model processes any sequence that derives from parents with similar folds. The method naturally mixes test and training data from site-directed recombination, DNA shuffling, or random mutagenesis experiments. We train a model from a site-directed recombination library with state-of-the-art prediction accuracy on hold-out test data from the same experimental source and convincing performance on chimeras with a different origin. Specifically, the model is used to assess the structural viability of P450 chimeras deriving from proteins with only 18% sequence similarity to those used for model tuning.


Subject(s)
Computational Biology/methods , Recombinant Fusion Proteins/chemistry , Protein Conformation , Protein Folding , Recombinant Fusion Proteins/genetics , Recombinant Fusion Proteins/metabolism
19.
Nucleic Acids Res ; 37(Web Server issue): W202-8, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19458158

ABSTRACT

The MEME Suite web server provides a unified portal for online discovery and analysis of sequence motifs representing features such as DNA binding sites and protein interaction domains. The popular MEME motif discovery algorithm is now complemented by the GLAM2 algorithm which allows discovery of motifs containing gaps. Three sequence scanning algorithms--MAST, FIMO and GLAM2SCAN--allow scanning numerous DNA and protein sequence databases for motifs discovered by MEME and GLAM2. Transcription factor motifs (including those discovered using MEME) can be compared with motifs in many popular motif databases using the motif database scanning algorithm TOMTOM. Transcription factor motifs can be further analyzed for putative function by association with Gene Ontology (GO) terms using the motif-GO term association tool GOMO. MEME output now contains sequence LOGOS for each discovered motif, as well as buttons to allow motifs to be conveniently submitted to the sequence and motif database scanning algorithms (MAST, FIMO and TOMTOM), or to GOMO, for further analysis. GLAM2 output similarly contains buttons for further analysis using GLAM2SCAN and for rerunning GLAM2 with different parameters. All of the motif-based tools are now implemented as web services via Opal. Source code, binaries and a web server are freely available for noncommercial use at http://meme.nbcr.net.


Subject(s)
Sequence Analysis, DNA , Sequence Analysis, Protein , Software , Algorithms , Binding Sites , Databases, Genetic , Internet , Regulatory Elements, Transcriptional , Transcription Factors/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL
...