Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 29
Filter
Add more filters










Publication year range
1.
PLoS One ; 17(9): e0274338, 2022.
Article in English | MEDLINE | ID: mdl-36084008

ABSTRACT

Gene expression is regulated through cis-regulatory elements (CREs), among which are promoters, enhancers, Polycomb/Trithorax Response Elements (PREs), silencers and insulators. Computational prediction of CREs can be achieved using a variety of statistical and machine learning methods combined with different feature space formulations. Although Python packages for DNA sequence feature sets and for machine learning are available, no existing package facilitates the combination of DNA sequence feature sets with machine learning methods for the genome-wide prediction of candidate CREs. We here present Gnocis, a Python package that streamlines the analysis and the modelling of CRE sequences by providing extensible APIs and implementing the glue required for combining feature sets and models for genome-wide prediction. Gnocis implements a variety of base feature sets, including motif pair occurrence frequencies and the k-spectrum mismatch kernel. It integrates with Scikit-learn and TensorFlow for state-of-the-art machine learning. Gnocis additionally implements a broad suite of tools for the handling and preparation of sequence, region and curve data, which can be useful for general DNA bioinformatics in Python. We also present Deep-MOCCA, a neural network architecture inspired by SVM-MOCCA that achieves moderate to high generalization without prior motif knowledge. To demonstrate the use of Gnocis, we applied multiple machine learning methods to the modelling of D. melanogaster PREs, including a Convolutional Neural Network (CNN), making this the first study to model PREs with CNNs. The models are readily adapted to new CRE modelling problems and to other organisms. In order to produce a high-performance, compiled package for Python 3, we implemented Gnocis in Cython. Gnocis can be installed using the PyPI package manager by running 'pip install gnocis'. The source code is available on GitHub, at https://github.com/bjornbredesen/gnocis.


Subject(s)
Drosophila melanogaster , Software , Algorithms , Animals , DNA/genetics , Drosophila melanogaster/genetics , Neural Networks, Computer , Response Elements
3.
BMC Bioinformatics ; 23(1): 39, 2022 Jan 14.
Article in English | MEDLINE | ID: mdl-35030988

ABSTRACT

BACKGROUND: Total-RNA sequencing (total-RNA-seq) allows the simultaneous study of both the coding and the non-coding transcriptome. Yet, computational pipelines have traditionally focused on particular biotypes, making assumptions that are not fullfilled by total-RNA-seq datasets. Transcripts from distinct RNA biotypes vary in length, biogenesis, and function, can overlap in a genomic region, and may be present in the genome with a high copy number. Consequently, reads from total-RNA-seq libraries may cause ambiguous genomic alignments, demanding for flexible quantification approaches. RESULTS: Here we present Multi-Graph count (MGcount), a total-RNA-seq quantification tool combining two strategies for handling ambiguous alignments. First, MGcount assigns reads hierarchically to small-RNA and long-RNA features to account for length disparity when transcripts overlap in the same genomic position. Next, MGcount aggregates RNA products with similar sequences where reads systematically multi-map using a graph-based approach. MGcount outputs a transcriptomic count matrix compatible with RNA-sequencing downstream analysis pipelines, with both bulk and single-cell resolution, and the graphs that model repeated transcript structures for different biotypes. The software can be used as a python module or as a single-file executable program. CONCLUSIONS: MGcount is a flexible total-RNA-seq quantification tool that successfully integrates reads that align to multiple genomic locations or that overlap with multiple gene features. Its approach is suitable for the simultaneous estimation of protein-coding, long non-coding and small non-coding transcript concentration, in both precursor and processed forms. Both source code and compiled software are available at https://github.com/hitaandrea/MGcount .


Subject(s)
Software , Transcriptome , RNA , RNA-Seq , Sequence Analysis, RNA
4.
BMC Bioinformatics ; 22(1): 234, 2021 May 07.
Article in English | MEDLINE | ID: mdl-33962556

ABSTRACT

BACKGROUND: Cis-regulatory elements (CREs) are DNA sequence segments that regulate gene expression. Among CREs are promoters, enhancers, Boundary Elements (BEs) and Polycomb Response Elements (PREs), all of which are enriched in specific sequence motifs that form particular occurrence landscapes. We have recently introduced a hierarchical machine learning approach (SVM-MOCCA) in which Support Vector Machines (SVMs) are applied on the level of individual motif occurrences, modelling local sequence composition, and then combined for the prediction of whole regulatory elements. We used SVM-MOCCA to predict PREs in Drosophila and found that it was superior to other methods. However, we did not publish a polished implementation of SVM-MOCCA, which can be useful for other researchers, and we only tested SVM-MOCCA with IUPAC motifs and PREs. RESULTS: We here present an expanded suite for modelling CRE sequences in terms of motif occurrence combinatorics-Motif Occurrence Combinatorics Classification Algorithms (MOCCA). MOCCA contains efficient implementations of several modelling methods, including SVM-MOCCA, and a new method, RF-MOCCA, a Random Forest-derivative of SVM-MOCCA. We used SVM-MOCCA and RF-MOCCA to model Drosophila PREs and BEs in cross-validation experiments, making this the first study to model PREs with Random Forests and the first study that applies the hierarchical MOCCA approach to the prediction of BEs. Both models significantly improve generalization to PREs and boundary elements beyond that of previous methods-including 4-spectrum and motif occurrence frequency Support Vector Machines and Random Forests-, with RF-MOCCA yielding the best results. CONCLUSION: MOCCA is a flexible and powerful suite of tools for the motif-based modelling of CRE sequences in terms of motif composition. MOCCA can be applied to any new CRE modelling problems where motifs have been identified. MOCCA supports IUPAC and Position Weight Matrix (PWM) motifs. For ease of use, MOCCA implements generation of negative training data, and additionally a mode that requires only that the user specifies positives, motifs and a genome. MOCCA is licensed under the MIT license and is available on Github at https://github.com/bjornbredesen/MOCCA .


Subject(s)
Algorithms , Support Vector Machine , Base Sequence , Nucleotide Motifs/genetics , Position-Specific Scoring Matrices
5.
Nucleic Acids Res ; 47(15): 7781-7797, 2019 09 05.
Article in English | MEDLINE | ID: mdl-31340029

ABSTRACT

Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.


Subject(s)
Algorithms , DNA/genetics , Drosophila melanogaster/genetics , Genome, Insect , Polycomb-Group Proteins/genetics , Response Elements , Animals , Binding Sites , Chromatin/chemistry , Chromatin/metabolism , Chromosomal Proteins, Non-Histone/genetics , Chromosomal Proteins, Non-Histone/metabolism , DNA/chemistry , DNA/metabolism , Drosophila Proteins/genetics , Drosophila Proteins/metabolism , Drosophila melanogaster/metabolism , Embryo, Nonmammalian , Gene Ontology , Histones/genetics , Histones/metabolism , Larva/genetics , Larva/metabolism , Molecular Sequence Annotation , Nucleotide Motifs , Polycomb-Group Proteins/metabolism , Protein Binding , Software , Transcription Factors/genetics , Transcription Factors/metabolism
6.
Bioinformatics ; 33(1): 145-147, 2017 01 01.
Article in English | MEDLINE | ID: mdl-27591081

ABSTRACT

The precision-recall plot is more informative than the ROC plot when evaluating classifiers on imbalanced datasets, but fast and accurate curve calculation tools for precision-recall plots are currently not available. We have developed Precrec, an R library that aims to overcome this limitation of the plot. Our tool provides fast and accurate precision-recall calculations together with multiple functionalities that work efficiently under different conditions. AVAILABILITY AND IMPLEMENTATION: Precrec is licensed under GPL-3 and freely available from CRAN (https://cran.r-project.org/package=precrec). It is implemented in R with C ++. CONTACT: takaya.saito@ii.uib.noSupplementary information: Supplementary data are available at Bioinformatics online.


Subject(s)
Computational Biology/methods , ROC Curve , Software
7.
F1000Res ; 5: 1531, 2016.
Article in English | MEDLINE | ID: mdl-27540470

ABSTRACT

Identifying functional modules or novel active pathways, recently termed de novo pathway enrichment, is a computational systems biology challenge that has gained much attention during the last decade. Given a large biological interaction network, KeyPathwayMiner extracts connected subnetworks that are enriched for differentially active entities from a series of molecular profiles encoded as binary indicator matrices. Since interaction networks constantly evolve, an important question is how robust the extracted results are when the network is modified. We enable users to study this effect through several network perturbation techniques and over a range of perturbation degrees. In addition, users may now provide a gold-standard set to determine how enriched extracted pathways are with relevant genes compared to randomized versions of the original network.

8.
Nucleic Acids Res ; 44(14): 6639-48, 2016 08 19.
Article in English | MEDLINE | ID: mdl-27330136

ABSTRACT

High-throughput screening (HTS) is an indispensable tool for drug (target) discovery that currently lacks user-friendly software tools for the robust identification of putative hits from HTS experiments and for the interpretation of these findings in the context of systems biology. We developed HiTSeekR as a one-stop solution for chemical compound screens, siRNA knock-down and CRISPR/Cas9 knock-out screens, as well as microRNA inhibitor and -mimics screens. We chose three use cases that demonstrate the potential of HiTSeekR to fully exploit HTS screening data in quite heterogeneous contexts to generate novel hypotheses for follow-up experiments: (i) a genome-wide RNAi screen to uncover modulators of TNFα, (ii) a combined siRNA and miRNA mimics screen on vorinostat resistance and (iii) a small compound screen on KRAS synthetic lethality. HiTSeekR is publicly available at http://hitseekr.compbio.sdu.dk It is the first approach to close the gap between raw data processing, network enrichment and wet lab target generation for various HTS screen types.


Subject(s)
Drug Evaluation, Preclinical , High-Throughput Screening Assays/methods , Caspases/metabolism , Drug Delivery Systems , Humans , MicroRNAs/genetics , MicroRNAs/metabolism , Quality Control , RNA Interference , Robotics , Signal Transduction , Tumor Necrosis Factor-alpha/metabolism
9.
PLoS One ; 10(3): e0118432, 2015.
Article in English | MEDLINE | ID: mdl-25738806

ABSTRACT

Binary classifiers are routinely evaluated with performance measures such as sensitivity and specificity, and performance is frequently illustrated with Receiver Operating Characteristics (ROC) plots. Alternative measures such as positive predictive value (PPV) and the associated Precision/Recall (PRC) plots are used less frequently. Many bioinformatics studies develop and evaluate classifiers that are to be applied to strongly imbalanced datasets in which the number of negatives outweighs the number of positives significantly. While ROC plots are visually appealing and provide an overview of a classifier's performance across a wide range of specificities, one can ask whether ROC plots could be misleading when applied in imbalanced classification scenarios. We show here that the visual interpretability of ROC plots in the context of imbalanced datasets can be deceptive with respect to conclusions about the reliability of classification performance, owing to an intuitive but wrong interpretation of specificity. PRC plots, on the other hand, can provide the viewer with an accurate prediction of future classification performance due to the fact that they evaluate the fraction of true positives among positive predictions. Our findings have potential implications for the interpretation of a large number of studies that use ROC plots on imbalanced datasets.


Subject(s)
Datasets as Topic , ROC Curve , Classification/methods
10.
Genetics ; 193(4): 1083-94, 2013 Apr.
Article in English | MEDLINE | ID: mdl-23335332

ABSTRACT

Mathematical models of meiosis that relate offspring to parental genotypes through parameters such as meiotic recombination frequency have been difficult to develop for polyploids. Existing models have limitations with respect to their analytic potential, their compatibility with insights into mechanistic aspects of meiosis, and their treatment of model parameters in terms of parameter dependencies. In this article I put forward a computational approach to the probabilistic modeling of meiosis. A computer program enumerates all possible paths through the phases of replication, pairing, recombination, and segregation, while keeping track of the probabilities of the paths according to the various parameters involved. Probabilities for classes of genotypes or phenotypes are added, and the resulting formulas are simplified by the symbolic-computation system Mathematica. An example application to autotetraploids results in a model that remedies the limitations of previous models mentioned above. In addition to the immediate implications, the computational approach presented here can be expected to be useful through opening avenues for modeling a host of processes, including meiosis in higher-order ploidies.


Subject(s)
Meiosis/genetics , Models, Genetic , Polyploidy , Plants/genetics
11.
BMC Biol ; 10: 32, 2012 Apr 18.
Article in English | MEDLINE | ID: mdl-22513177

ABSTRACT

This article is a response to Wang and Luo.See correspondence article http://www.biomedcentral.com/1741-7007/10/30 and the original research article http://www.biomedcentral.com/1741-7007/9/24.


Subject(s)
Arabidopsis/genetics , Chromosome Pairing , Chromosomes, Plant/genetics , Recombination, Genetic , Tetraploidy
12.
PLoS Genet ; 7(6): e1002126, 2011 Jun.
Article in English | MEDLINE | ID: mdl-21698132

ABSTRACT

Genomic imprinting is an epigenetic phenomenon leading to parent-of-origin specific differential expression of maternally and paternally inherited alleles. In plants, genomic imprinting has mainly been observed in the endosperm, an ephemeral triploid tissue derived after fertilization of the diploid central cell with a haploid sperm cell. In an effort to identify novel imprinted genes in Arabidopsis thaliana, we generated deep sequencing RNA profiles of F1 hybrid seeds derived after reciprocal crosses of Arabidopsis Col-0 and Bur-0 accessions. Using polymorphic sites to quantify allele-specific expression levels, we could identify more than 60 genes with potential parent-of-origin specific expression. By analyzing the distribution of DNA methylation and epigenetic marks established by Polycomb group (PcG) proteins using publicly available datasets, we suggest that for maternally expressed genes (MEGs) repression of the paternally inherited alleles largely depends on DNA methylation or PcG-mediated repression, whereas repression of the maternal alleles of paternally expressed genes (PEGs) predominantly depends on PcG proteins. While maternal alleles of MEGs are also targeted by PcG proteins, such targeting does not cause complete repression. Candidate MEGs and PEGs are enriched for cis-proximal transposons, suggesting that transposons might be a driving force for the evolution of imprinted genes in Arabidopsis. In addition, we find that MEGs and PEGs are significantly faster evolving when compared to other genes in the genome. In contrast to the predominant location of mammalian imprinted genes in clusters, cluster formation was only detected for few MEGs and PEGs, suggesting that clustering is not a major requirement for imprinted gene regulation in Arabidopsis.


Subject(s)
Alleles , Arabidopsis/genetics , Endosperm/genetics , Gene Expression Regulation, Plant , Animals , DNA Methylation/genetics , DNA Transposable Elements/genetics , Evolution, Molecular , Gene Expression Profiling , Genes, Plant , Genome, Plant/genetics , Genomic Imprinting , Multigene Family/genetics , Polycomb-Group Proteins , Repressor Proteins/metabolism , Seeds/genetics
13.
BMC Biol ; 9: 24, 2011 Apr 21.
Article in English | MEDLINE | ID: mdl-21510849

ABSTRACT

BACKGROUND: Polyploidization is the multiplication of the whole chromosome complement and has occurred frequently in vascular plants. Maintenance of stable polyploid state over generations requires special mechanisms to control pairing and distribution of more than two homologous chromosomes during meiosis. Since a minimal number of crossover events is essential for correct chromosome segregation, we investigated whether polyploidy has an influence on the frequency of meiotic recombination. RESULTS: Using two genetically linked transgenes providing seed-specific fluorescence, we compared a high number of progeny from diploid and tetraploid Arabidopsis plants. We show that rates of meiotic recombination in reciprocal crosses of genetically identical diploid and autotetraploid Arabidopsis plants were significantly higher in tetraploids compared to diploids. Although male and female gametogenesis differ substantially in meiotic recombination frequency, both rates were equally increased in tetraploids. To investigate whether multivalent formation in autotetraploids was responsible for the increased recombination rates, we also performed corresponding experiments with allotetraploid plants showing strict bivalent pairing. We found similarly increased rates in auto- and allotetraploids, suggesting that the ploidy effect is independent of chromosome pairing configurations. CONCLUSIONS: The evolutionary success of polyploid plants in nature and under domestication has been attributed to buffering of mutations and sub- and neo-functionalization of duplicated genes. Should the data described here be representative for polyploid plants, enhanced meiotic recombination, and the resulting rapid creation of genetic diversity, could have also contributed to their prevalence.


Subject(s)
Arabidopsis/genetics , Chromosome Pairing , Chromosomes, Plant/genetics , Recombination, Genetic , Tetraploidy , Arabidopsis/cytology , Biological Evolution , Gametogenesis, Plant , Plants, Genetically Modified/genetics
14.
Nucleic Acids Res ; 37(12): 4010-21, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19417064

ABSTRACT

MicroRNAs (miRNAs) are 20-24 nt long endogenous non-coding RNAs that act as post-transcriptional regulators in metazoa and plants. Plant miRNA targets typically contain a single sequence motif with near-perfect complementarity to the miRNA. Here, we extended and applied the program RNAhybrid to identify novel miRNA targets in the complete annotated Arabidopsis thaliana transcriptome. RNAhybrid predicts the energetically most favorable miRNA:mRNA hybrids that are consistent with user-defined structural constraints. These were: (i) perfect base pairing of the duplex from nucleotide 8 to 12 counting from the 5'-end of the miRNA; (ii) loops with a maximum length of one nucleotide in either strand; (iii) bulges with no more than one nucleotide in size; and (iv) unpaired end overhangs not longer than two nucleotides. G:U base pairs are not treated as mismatches, but contribute less favorable to the overall free energy. The resulting hybrids were filtered according to their minimum free energy, resulting in an overall prediction of more than 600 novel miRNA targets. The specificity and signal-to-noise ratio of the prediction was assessed with either randomized miRNAs or randomized target sequences as negative controls. Our results are in line with recent observations that the majority of miRNA targets are not transcription factors.


Subject(s)
Arabidopsis/genetics , MicroRNAs/chemistry , RNA, Messenger/chemistry , RNA, Plant/chemistry , Arabidopsis/metabolism , Arabidopsis Proteins/genetics , Gene Expression Profiling , Gene Expression Regulation, Plant , Software
15.
Nat Cell Biol ; 11(6): 705-16, 2009 Jun.
Article in English | MEDLINE | ID: mdl-19465924

ABSTRACT

The microRNA pathway has been implicated in the regulation of synaptic protein synthesis and ultimately in dendritic spine morphogenesis, a phenomenon associated with long-lasting forms of memory. However, the particular microRNAs (miRNAs) involved are largely unknown. Here we identify specific miRNAs that function at synapses to control dendritic spine structure by performing a functional screen. One of the identified miRNAs, miR-138, is highly enriched in the brain, localized within dendrites and negatively regulates the size of dendritic spines in rat hippocampal neurons. miR-138 controls the expression of acyl protein thioesterase 1 (APT1), an enzyme regulating the palmitoylation status of proteins that are known to function at the synapse, including the alpha(13) subunits of G proteins (Galpha(13)). RNA-interference-mediated knockdown of APT1 and the expression of membrane-localized Galpha(13) both suppress spine enlargement caused by inhibition of miR-138, suggesting that APT1-regulated depalmitoylation of Galpha(13) might be an important downstream event of miR-138 function. Our results uncover a previously unknown miRNA-dependent mechanism in neurons and demonstrate a previously unrecognized complexity of miRNA-dependent control of dendritic spine morphogenesis.


Subject(s)
Dendritic Spines , MicroRNAs/metabolism , Synapses , Thiolester Hydrolases/metabolism , Animals , Base Sequence , Cell Line , Dendritic Spines/enzymology , Dendritic Spines/ultrastructure , GTP-Binding Protein alpha Subunits, G12-G13/metabolism , Gene Expression Profiling , Hippocampus/cytology , Humans , Lipoylation , Mice , Mice, Inbred C57BL , MicroRNAs/genetics , Molecular Sequence Data , Morphogenesis , Neurons/cytology , Neurons/metabolism , Oligonucleotide Array Sequence Analysis , Rats , Receptors, Glutamate/metabolism , Synapses/metabolism , Synapses/ultrastructure , Thiolester Hydrolases/antagonists & inhibitors , Thiolester Hydrolases/genetics
16.
Bioinformatics ; 25(8): 1084-5, 2009 Apr 15.
Article in English | MEDLINE | ID: mdl-19246510

ABSTRACT

We introduce the tool mkESA, an open source program for constructing enhanced suffix arrays (ESAs), striving for low memory consumption, yet high practical speed. mkESA is a user-friendly program written in portable C99, based on a parallelized version of the Deep-Shallow suffix array construction algorithm, which is known for its high speed and small memory usage. The tool handles large FASTA files with multiple sequences, and computes suffix arrays and various additional tables, such as the LCP table (longest common prefix) or the inverse suffix array, from given sequence data.


Subject(s)
Algorithms , Computational Biology/methods , Sequence Analysis/methods , Software , Sequence Analysis, DNA/methods , Sequence Analysis, Protein/methods , Sequence Analysis, RNA/methods
17.
PLoS Biol ; 6(10): e261, 2008 Oct 28.
Article in English | MEDLINE | ID: mdl-18959483

ABSTRACT

cis-Regulatory DNA elements contain multiple binding sites for activators and repressors of transcription. Among these elements are enhancers, which establish gene expression states, and Polycomb/Trithorax response elements (PREs), which take over from enhancers and maintain transcription states of several hundred developmentally important genes. PREs are essential to the correct identities of both stem cells and differentiated cells. Evolutionary differences in cis-regulatory elements are a rich source of phenotypic diversity, and functional binding sites within regulatory elements turn over rapidly in evolution. However, more radical evolutionary changes that go beyond motif turnover have been difficult to assess. We used a combination of genome-wide bioinformatic prediction and experimental validation at specific loci, to evaluate PRE evolution across four Drosophila species. Our results show that PRE evolution is extraordinarily dynamic. First, we show that the numbers of PREs differ dramatically between species. Second, we demonstrate that functional binding sites within PREs at conserved positions turn over rapidly in evolution, as has been observed for enhancer elements. Finally, although it is theoretically possible that new elements can arise out of nonfunctional sequence, evidence that they do so is lacking. We show here that functional PREs are found at nonorthologous sites in conserved gene loci. By demonstrating that PRE evolution is not limited to the adaptation of preexisting elements, these findings document a novel dimension of cis-regulatory evolution.


Subject(s)
Chromosomal Proteins, Non-Histone/genetics , Drosophila Proteins/genetics , Drosophila/genetics , Evolution, Molecular , Response Elements/genetics , Animals , Blotting, Western , Chromatin Immunoprecipitation , Chromosomal Proteins, Non-Histone/metabolism , Computational Biology/methods , Drosophila/classification , Drosophila/metabolism , Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Genome/genetics , Phylogeny , Polycomb Repressive Complex 1 , Species Specificity
18.
Methods Mol Biol ; 342: 87-99, 2006.
Article in English | MEDLINE | ID: mdl-16957369

ABSTRACT

I describe the use of RNAhybrid, a program that predicts multiple potential binding sites of microRNAs (miRNAs) in large target RNAs. The core algorithm finds the energetically most favorable hybridization sites of a miRNA in a large potential target RNA. Intramolecular hybridizations, i.e., base pairings between target nucleotides or between miRNA nucleotides are, not allowed. For large targets, the time complexity of the algorithm is linear in the target length, allowing many long targets to be searched in a short time. Starting from the observation that the binding energies are results from an optimization procedure, we can model them as following an extreme value distribution. From this, we can calculate the statistical significance of individual binding sites, of multiple binding sites in a single target sequence, and of binding sites in comparative analyses of orthologous sequences across species. The latter involves the calculation of the effective number of orthologous sequences, which can be considerably smaller than the actual number, reflecting the statistical dependence of evolutionarily related sequences.


Subject(s)
MicroRNAs/genetics , MicroRNAs/metabolism , Algorithms , Animals , Binding Sites , Computer Graphics , Humans , Internet , MicroRNAs/chemistry , Models, Genetic , Nucleic Acid Conformation , Software , Thermodynamics , User-Computer Interface
19.
Nucleic Acids Res ; 34(Web Server issue): W451-4, 2006 Jul 01.
Article in English | MEDLINE | ID: mdl-16845047

ABSTRACT

In the elucidation of the microRNA regulatory network, knowledge of potential targets is of highest importance. Among existing target prediction methods, RNAhybrid [M. Rehmsmeier, P. Steffen, M. Höchsmann and R. Giegerich (2004) RNA, 10, 1507-1517] is unique in offering a flexible online prediction. Recently, some useful features have been added, among these the possibility to disallow G:U base pairs in the seed region, and a seed-match speed-up, which accelerates the program by a factor of 8. In addition, the program can now be used as a webservice for remote calls from user-implemented programs. We demonstrate RNAhybrid's flexibility with the prediction of a non-canonical target site for Caenorhabditis elegans miR-241 in the 3'-untranslated region of lin-39. RNAhybrid is available at http://bibiserv.techfak.uni-bielefeld.de/rnahybrid.


Subject(s)
MicroRNAs/chemistry , RNA Interference , Software , 3' Untranslated Regions/chemistry , Animals , Binding Sites , Caenorhabditis elegans/genetics , Caenorhabditis elegans Proteins/genetics , Homeodomain Proteins/genetics , Internet
20.
Nucleic Acids Res ; 34(Web Server issue): W546-50, 2006 Jul 01.
Article in English | MEDLINE | ID: mdl-16845067

ABSTRACT

Gene regulation is the process through which an organism effects spatial and temporal differences in gene expression levels. Knowledge of cis-regulatory elements as key players in gene regulation is indispensable for the understanding of the latter and of the development of organisms. Here we present the tool jPREdictor for the fast and versatile prediction of cis-regulatory elements on a genome-wide scale. The prediction is based on clusters of individual motifs and any combination of these into multi-motifs with selectable minimal and maximal distances. Individual motifs can be of heterogenous classes, such as simple sequence motifs or position-specific scoring matrices. Cluster scores are weighted occurrences of multi-motifs, where the weights are derived from positive and negative training sets. We illustrate the flexibility of the jPREdictor with a new predic-tion of Polycomb/Trithorax Response Elements in Drosophila melanogaster. jPREdictor is available as a graphical user interface for online use and for download at http://bibiserv.techfak.uni-bielefeld.de/jpredictor.


Subject(s)
Genomics/methods , Regulatory Elements, Transcriptional , Software , Animals , Binding Sites , Drosophila Proteins/metabolism , Drosophila melanogaster/genetics , Internet , Polycomb Repressive Complex 1 , Response Elements , Transcription Factors/metabolism , User-Computer Interface
SELECTION OF CITATIONS
SEARCH DETAIL
...