Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Mais filtros











Base de dados
Intervalo de ano de publicação
1.
IEEE/ACM Trans Comput Biol Bioinform ; 20(5): 2736-2747, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-34633933

RESUMO

RNA elements that are transcribed but not translated into proteins are called non-coding RNAs (ncRNAs). They play wide-ranging roles in biological processes and disorders. Just like proteins, their structure is often intimately linked to their function. Many examples have been documented where structure is conserved across taxa despite sequence divergence. Thus, structure is often used to identify function. Specifically, the secondary structure is predicted and ncRNAs with similar structures are assumed to have same or similar functions. However, a strand of RNA can fold into multiple possible structures, and some strands even fold differently in vivo and in vitro. Furthermore, ncRNAs often function as RNA-protein complexes, which can affect structure. Because of these, we hypothesized using one structure per sequence may discard information, possibly resulting in poorer classification accuracy. Therefore, we propose using secondary structure fingerprints, comprising two categories: a higher-level representation derived from RNA-As-Graphs (RAG), and free energy fingerprints based on a curated repertoire of small structural motifs. The fingerprints take into account the difference between global and local structural matches. We also evaluated our deep learning architecture with k-mers. By combining our global-local fingerprints with 6-mer, we achieved an accuracy, precision, and recall of 91.04%, 91.10%, and 91.00%.

2.
BMC Bioinformatics ; 22(1): 69, 2021 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-33588754

RESUMO

BACKGROUND: Chromatin immunoprecipitation followed by high throughput sequencing (ChIP-seq), initially introduced more than a decade ago, is widely used by the scientific community to detect protein/DNA binding and histone modifications across the genome. Every experiment is prone to noise and bias, and ChIP-seq experiments are no exception. To alleviate bias, the incorporation of control datasets in ChIP-seq analysis is an essential step. The controls are used to account for the background signal, while the remainder of the ChIP-seq signal captures true binding or histone modification. However, a recurrent issue is different types of bias in different ChIP-seq experiments. Depending on which controls are used, different aspects of ChIP-seq bias are better or worse accounted for, and peak calling can produce different results for the same ChIP-seq experiment. Consequently, generating "smart" controls, which model the non-signal effect for a specific ChIP-seq experiment, could enhance contrast and increase the reliability and reproducibility of the results. RESULT: We propose a peak calling algorithm, Weighted Analysis of ChIP-seq (WACS), which is an extension of the well-known peak caller MACS2. There are two main steps in WACS: First, weights are estimated for each control using non-negative least squares regression. The goal is to customize controls to model the noise distribution for each ChIP-seq experiment. This is then followed by peak calling. We demonstrate that WACS significantly outperforms MACS2 and AIControl, another recent algorithm for generating smart controls, in the detection of enriched regions along the genome, in terms of motif enrichment and reproducibility analyses. CONCLUSIONS: This ultimately improves our understanding of ChIP-seq controls and their biases, and shows that WACS results in a better approximation of the noise distribution in controls.


Assuntos
Sequenciamento de Cromatina por Imunoprecipitação , Sequenciamento de Nucleotídeos em Larga Escala , Algoritmos , Imunoprecipitação da Cromatina , Reprodutibilidade dos Testes , Análise de Sequência de DNA
3.
BMC Bioinformatics ; 15 Suppl 13: S2, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25434643

RESUMO

Frequent subgraph mining is a useful method for extracting meaningful patterns from a set of graphs or a single large graph. Here, the graph represents all possible RNA structures and interactions. Patterns that are significantly more frequent in this graph over a random graph are extracted. We hypothesize that these patterns are most likely to represent biological mechanisms. The graph representation used is a directed dual graph, extended to handle intermolecular interactions. The graph is sampled for subgraphs, which are labeled using a canonical labeling method and counted. The resulting patterns are compared to those created from a randomized dataset and scored. The algorithm was applied to the mitochondrial genome of the kinetoplastid species Trypanosoma brucei, which has a unique RNA editing mechanism. The most significant patterns contain two stem-loops, indicative of gRNA, and represent interactions of these structures with target mRNA.


Assuntos
Algoritmos , Gráficos por Computador , Edição de RNA , RNA/química , Trypanosoma brucei brucei/genética , Sequência de Bases , Prófase Meiótica I/genética , Modelos Moleculares , Dados de Sequência Molecular , Conformação de Ácido Nucleico , RNA/classificação , RNA/metabolismo
4.
J Bioinform Comput Biol ; 12(5): 1450027, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-25362841

RESUMO

The discovery of common RNA secondary structure motifs is an important problem in bioinformatics. The presence of such motifs is usually associated with key biological functions. However, the identification of structural motifs is far from easy. Unlike motifs in sequences, which have conserved bases, structural motifs have common structure arrangements even if the underlying sequences are different. Over the past few years, hundreds of algorithms have been published for the discovery of sequential motifs, while less work has been done for the structural motifs case. Current structural motif discovery algorithms are limited in terms of accuracy and scalability. In this paper, we present an incremental and scalable algorithm for discovering RNA secondary structure motifs, namely IncMD. We consider the structural motif discovery as a frequent pattern mining problem and tackle it using a modified a priori algorithm. IncMD uses data structures, trie-based linked lists of prefixes (LLP), to accelerate the search and retrieval of patterns, support counting, and candidate generation. We modify the candidate generation step in order to adapt it to the RNA secondary structure representation. IncMD constructs the frequent patterns incrementally from RNA secondary structure basic elements, using nesting and joining operations. The notion of a motif group is introduced in order to simulate an alignment of motifs that only differ in the number of unpaired bases. In addition, we use a cluster beam approach to select motifs that will survive to the next iterations of the search. Results indicate that IncMD can perform better than some of the available structural motif discovery algorithms in terms of sensitivity (Sn), positive predictive value (PPV), and specificity (Sp). The empirical results also show that the algorithm is scalable and runs faster than all of the compared algorithms.


Assuntos
Algoritmos , Conformação de Ácido Nucleico , RNA/química , Sequência de Bases , Biologia Computacional , Simulação por Computador , Mineração de Dados , Bases de Dados de Ácidos Nucleicos , Modelos Moleculares
5.
RNA Biol ; 10(2): 301-13, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23324603

RESUMO

We previously reported a unique genome with systematically fragmented genes and gene pieces dispersed across numerous circular chromosomes, occurring in mitochondria of diplonemids. Genes are split into up to 12 short fragments (modules), which are separately transcribed and joined in a way that differs from known trans-splicing. Further, cox1 mRNA includes six non-encoded uridines indicating RNA editing. In the absence of recognizable cis-elements, we postulated that trans-splicing and RNA editing are directed by trans-acting molecules. Here, we provide insight into the post-transcriptional processes by investigating transcription, RNA processing, trans-splicing and RNA editing in cox1 and at a newly discovered site in cob. We show that module precursor transcripts are up to several thousand nt long and processed accurately at their 5' and 3' termini to yield the short coding-only regions. Processing at 5' and 3' ends occurs independently, and a processed terminus engages in trans-splicing even if the module's other terminus is yet unprocessed. Moreover, only cognate module transcripts join, though without directionality. In contrast, module transcripts requiring RNA editing only trans-splice when editing is completed. Finally, experimental and computational analyses suggest the existence of RNA trans-factors with the potential for guiding both trans-splicing and RNA editing.


Assuntos
Euglenozoários/genética , Genes Mitocondriais , Genes de Protozoários , Mitocôndrias/genética , RNA de Protozoário/metabolismo , Sequência de Bases , Cromossomos/genética , Cromossomos/metabolismo , Ciclo-Oxigenase 1/genética , Ciclo-Oxigenase 1/metabolismo , Mitocôndrias/metabolismo , Dados de Sequência Molecular , Poliadenilação , Edição de RNA , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , RNA de Protozoário/genética , Trans-Splicing , Transcrição Gênica
6.
Bioinformatics ; 27(17): 2391-8, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21743060

RESUMO

MOTIVATION: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments. However, AEA isolates and tests for overrepresentation only individual annotation terms or groups of similar terms and is limited in its ability to uncover complex phenomena involving relationship between multiple annotation terms from various knowledge bases. Also, AEA assumes that annotations describe the whole object of interest, which makes it difficult to apply it to sets of compound objects (e.g. sets of protein-protein interactions) and to sets of objects having an internal structure (e.g. protein complexes). RESULTS: We propose a novel logic-based Annotation Concept Synthesis and Enrichment Analysis (ACSEA) approach. ACSEA fuses inductive logic reasoning with statistical inference to uncover more complex phenomena captured by the experiments. We evaluate our approach on large-scale datasets from several microarray experiments and on a clustered genome-wide genetic interaction network using different biological knowledge bases. The discovered interpretations have lower P-values than the interpretations found by AEA, are highly integrative in nature, and include analysis of quantitative and structured information present in the knowledge bases. The results suggest that ACSEA can boost effectiveness of the processing of high-throughput experiments. CONTACT: mjiline@site.uottawa.ca.


Assuntos
Genômica/métodos , Ensaios de Triagem em Larga Escala , Anotação de Sequência Molecular , Proteômica/métodos , Algoritmos , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos
7.
Mol Biol Evol ; 28(9): 2425-8, 2011 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-21436119

RESUMO

In the protist Diplonema papillatum (Diplonemea, Euglenozoa), mitochondrial genes are systematically fragmented with each nonoverlapping piece (module) encoded individually on a distinct circular chromosome. Gene modules are transcribed separately, and precursor transcripts are assembled to mature mRNA by a trans-splicing process of yet unknown mechanism. Expression of the cox1 gene that consists of nine modules, also involves RNA editing by which six uridines are added between Modules 4 and 5. Here, we investigate whether the unusual features of cox1 are shared by all Diplonemea and what the mechanism of trans-splicing might be. We examine three additional species representing both Diplonemea genera, namely D. papillatum described before, and D. ambulator, Diplonema sp.2, and Rhynchopus euleeides and discover that in all Diplonemea, the cox1 gene is discontinuous and split up into nine modules that each reside on a distinct chromosome. Positions of gene breakpoints vary by up to two nucleotides. Further, all taxa have six nonencoded uridines inserted in cox1 mRNA at exactly the same position as D. papillatum. In silico searches do not detect signatures of introns known to engage in trans-splicing, in particular Group I, Group II, spliceosomal, and transfer RNA introns. Nor did we find statistically significant reverse-complementary motifs between adjacent modules and their flanking regions, or residues conserved within or across species. This provides compelling evidence that trans-splicing in Diplonemea mitochondria does not rely on sequence elements in cis but rather proceeds by a mechanism employing matchmaking trans factors, such as RNAs or proteins.


Assuntos
Ciclo-Oxigenase 1/genética , Evolução Molecular , Prófase Meiótica I/genética , Trans-Splicing/genética , Sequência de Bases , Genoma Mitocondrial , Íntrons , Dados de Sequência Molecular , Filogenia , Edição de RNA/genética
8.
Nucleic Acids Res ; 35(14): 4664-77, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17591613

RESUMO

Internal ribosome entry sites (IRES) allow ribosomes to be recruited to mRNA in a cap-independent manner. Some viruses that impair cap-dependent translation initiation utilize IRES to ensure that the viral RNA will efficiently compete for the translation machinery. IRES are also employed for the translation of a subset of cellular messages during conditions that inhibit cap-dependent translation initiation. IRES from viruses like Hepatitis C and Classical Swine Fever virus share a similar structure/function without sharing primary sequence similarity. Of the cellular IRES structures derived so far, none were shown to share an overall structural similarity. Therefore, we undertook a genome-wide search of human 5'UTRs (untranslated regions) with an empirically derived structure of the IRES from the key inhibitor of apoptosis, X-linked inhibitor of apoptosis protein (XIAP), to identify novel IRES that share structure/function similarity. Three of the top matches identified by this search that exhibit IRES activity are the 5'UTRs of Aquaporin 4, ELG1 and NF-kappaB repressing factor (NRF). The structures of AQP4 and ELG1 IRES have limited similarity to the XIAP IRES; however, they share trans-acting factors that bind the XIAP IRES. We therefore propose that cellular IRES are not defined by overall structure, as viral IRES, but are instead dependent upon short motifs and trans-acting factors for their function.


Assuntos
Regiões 5' não Traduzidas/química , Biossíntese de Proteínas , Regiões 5' não Traduzidas/metabolismo , Aquaporina 4/genética , Sequência de Bases , Sítios de Ligação , Linhagem Celular , Proteínas de Ligação a DNA/genética , Genoma Humano , Genômica , Humanos , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Proteínas de Ligação a RNA/metabolismo , Proteínas Repressoras/genética , Proteínas Inibidoras de Apoptose Ligadas ao Cromossomo X/genética
9.
BMC Bioinformatics ; 8: 190, 2007 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-17559658

RESUMO

BACKGROUND: In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides. RESULTS: The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure. CONCLUSION: We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.


Assuntos
Algoritmos , Modelos Químicos , Modelos Moleculares , RNA/química , RNA/ultraestrutura , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Pareamento Incorreto de Bases , Sequência de Bases , Simulação por Computador , Dados de Sequência Molecular , Conformação de Ácido Nucleico
10.
RNA ; 12(10): 1755-85, 2006 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-16957278

RESUMO

The cell has many ways to regulate the production of proteins. One mechanism is through the changes to the machinery of translation initiation. These alterations favor the translation of one subset of mRNAs over another. It was first shown that internal ribosome entry sites (IRESes) within viral RNA genomes allowed the production of viral proteins more efficiently than most of the host proteins. The RNA secondary structure of viral IRESes has sometimes been conserved between viral species even though the primary sequences differ. These structures are important for IRES function, but no similar structure conservation has yet to be shown in cellular IRES. With the advances in mathematical modeling and computational approaches to complex biological problems, is there a way to predict an IRES in a data set of unknown sequences? This review examines what is known about cellular IRES structures, as well as the data sets and tools available to examine this question. We find that the lengths, number of upstream AUGs, and %GC content of 5'-UTRs of the human transcriptome have a similar distribution to those of published IRES-containing UTRs. Although the UTRs containing IRESes are on the average longer, almost half of all 5'-UTRs are long enough to contain an IRES. Examination of the available RNA structure prediction software and RNA motif searching programs indicates that while these programs are useful tools to fine tune the empirically determined RNA secondary structure, the accuracy of de novo secondary structure prediction of large RNA molecules and subsequent identification of new IRES elements by computational approaches, is still not possible.


Assuntos
Regiões 5' não Traduzidas/genética , Regiões 5' não Traduzidas/metabolismo , Regiões 5' não Traduzidas/química , Algoritmos , Animais , Sequência de Bases , Bases de Dados de Ácidos Nucleicos , Teste de Complementação Genética , Humanos , Modelos Moleculares , Dados de Sequência Molecular , Conformação de Ácido Nucleico , Biossíntese de Proteínas , RNA Viral/genética , RNA Viral/metabolismo , Proteínas de Ligação a RNA/metabolismo , Ribossomos/metabolismo , Software , Termodinâmica
11.
BMC Bioinformatics ; 7: 244, 2006 May 05.
Artigo em Inglês | MEDLINE | ID: mdl-16677380

RESUMO

BACKGROUND: The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process. RESULTS: We present an algorithm called Seed to identify all the conserved RNA secondary structure motifs in a set of unaligned sequences. The search space is defined as the set of all the secondary structure motifs inducible from a seed sequence. A general-to-specific search allows finding all the motifs that are conserved. Suffix arrays are used to enumerate efficiently all the biological palindromes as well as for the matching of RNA secondary structure expressions. We assessed the ability of this approach to uncover known structures using four datasets. The enumeration of the motifs relies only on the secondary structure definition and conservation only, therefore allowing for the independent evaluation of scoring schemes. Twelve simple objective functions based on free energy were evaluated for their potential to discriminate native folds from the rest. CONCLUSION: Our evaluation shows that 1) support and exclusion constraints are sufficient to make an exhaustive search of the secondary structure space feasible. 2) The search space induced from a seed sequence contains known motifs. 3) Simple objective functions, consisting of a combination of the free energy of matching sequences, can generally identify motifs with high positive predictive value and sensitivity to known motifs.


Assuntos
Algoritmos , RNA/química , RNA/genética , Alinhamento de Sequência/métodos , Análise de Sequência de RNA/métodos , Sequência de Bases , Dados de Sequência Molecular , Conformação de Ácido Nucleico
12.
Int J Bioinform Res Appl ; 1(2): 230-45, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-18048133

RESUMO

Comparative RNA sequence analyses have contributed remarkably accurate predictions. The recent determination of the 30S and 50S ribosomal subunits bringing more supporting evidence. Several inference tools are combining free energy minimisation and comparative analysis to improve the quality of secondary structure predictions. This paper investigates the following hypotheses: the use of three input sequences improves the average accuracy compared to predictions based on two input sequences; the worse prediction (minimum accuracy) for any sequence should be more accurate when three input sequences are used rather than two; finally, the consensus structure of three sequences is probably less representative of the individual sequences. The average coverage should be less.


Assuntos
Conformação de Ácido Nucleico , RNA , Algoritmos , RNA/química , Análise de Sequência de RNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA