Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Philos Trans A Math Phys Eng Sci ; 372(2016): 20130133, 2014 May 28.
Artigo em Inglês | MEDLINE | ID: mdl-24751867

RESUMO

In this paper, we provide an O(n log(2) n log log n log* n) algorithm to compute a duplication history of a string under no-breakpoint-reuse condition. The motivation of this problem stems from computational biology, in particular, from analysis of complex gene clusters. The problem is also related to computing edit distance with block operations, but, in our scenario, the start of the history is not fixed, but chosen to minimize the distance measure.

2.
BMC Bioinformatics ; 15: 404, 2014 Dec 31.
Artigo em Inglês | MEDLINE | ID: mdl-25551362

RESUMO

BACKGROUND: Identifying sequence-structure motifs common to two RNAs can speed up the comparison of structural RNAs substantially. The core algorithm of the existent approach ExpaRNA solves this problem for a priori known input structures. However, such structures are rarely known; moreover, predicting them computationally is no rescue, since single sequence structure prediction is highly unreliable. RESULTS: The novel algorithm ExpaRNA-P computes exactly matching sequence-structure motifs in entire Boltzmann-distributed structure ensembles of two RNAs; thereby we match and fold RNAs simultaneously, analogous to the well-known "simultaneous alignment and folding" of RNAs. While this implies much higher flexibility compared to ExpaRNA, ExpaRNA-P has the same very low complexity (quadratic in time and space), which is enabled by its novel structure ensemble-based sparsification. Furthermore, we devise a generalized chaining algorithm to compute compatible subsets of ExpaRNA-P's sequence-structure motifs. Resulting in the very fast RNA alignment approach ExpLoc-P, we utilize the best chain as anchor constraints for the sequence-structure alignment tool LocARNA. ExpLoc-P is benchmarked in several variants and versus state-of-the-art approaches. In particular, we formally introduce and evaluate strict and relaxed variants of the problem; the latter makes the approach sensitive to compensatory mutations. Across a benchmark set of typical non-coding RNAs, ExpLoc-P has similar accuracy to LocARNA but is four times faster (in both variants), while it achieves a speed-up over 30-fold for the longest benchmark sequences (≈400nt). Finally, different ExpLoc-P variants enable tailoring of the method to specific application scenarios. ExpaRNA-P and ExpLoc-P are distributed as part of the LocARNA package. The source code is freely available at http://www.bioinf.uni-freiburg.de/Software/ExpaRNA-P . CONCLUSIONS: ExpaRNA-P's novel ensemble-based sparsification reduces its complexity to quadratic time and space. Thereby, ExpaRNA-P significantly speeds up sequence-structure alignment while maintaining the alignment quality. Different ExpaRNA-P variants support a wide range of applications.


Assuntos
Algoritmos , Dobramento de RNA , Homologia de Sequência do Ácido Nucleico , RNA/química , Análise de Sequência de RNA , Software
3.
Artigo em Inglês | MEDLINE | ID: mdl-26355520

RESUMO

Detecting local common sequence-structure regions of RNAs is a biologically important problem. Detecting such regions allows biologists to identify functionally relevant similarities between the inspected molecules. We developed dynamic programming algorithms for finding common structure-sequence patterns between two RNAs. The RNAs are given by their sequence and a set of potential base pairs with associated probabilities. In contrast to prior work on local pattern matching of RNAs, we support the breaking of arcs. This allows us to add flexibility over matching only fixed structures; potentially matching only a similar subset of specified base pairs. We present an O(n(3)) algorithm for local exact pattern matching between two nested RNAs, and an O(n(3) log n) algorithm for one nested RNA and one bounded-unlimited RNA. In addition, an algorithm for approximate pattern matching is introduced that for two given nested RNAs and a number k, finds the maximal local pattern matching score between the two RNAs with at most k mismatches in O(n(3)k(2)) time. Finally, we present an O(n(3)) algorithm for finding the most similar subforest between two nested RNAs.


Assuntos
Biologia Computacional/métodos , Reconhecimento Automatizado de Padrão/métodos , RNA/química , Análise de Sequência de RNA/métodos , Algoritmos , Conformação de Ácido Nucleico
4.
Artigo em Inglês | MEDLINE | ID: mdl-20733241

RESUMO

The haplotype inference problem (HIP) asks to find a set of haplotypes which resolve a given set of genotypes. This problem is important in practical fields such as the investigation of diseases or other types of genetic mutations. In order to find the haplotypes which are as close as possible to the real set of haplotypes that comprise the genotypes, two models have been suggested which are by now well-studied: The perfect phylogeny model and the pure parsimony model. All known algorithms up till now for haplotype inference may find haplotypes that are not necessarily plausible, i.e., very rare haplotypes or haplotypes that were never observed in the population. In order to overcome this disadvantage, we study in this paper, a new constrained version of HIP under the above-mentioned models. In this new version, a pool of plausible haplotypes H is given together with the set of genotypes G, and the goal is to find a subset H ⊆ H that resolves G. For constrained perfect phlogeny haplotyping (CPPH), we provide initial insights and polynomial-time algorithms for some restricted cases of the problem. For constrained parsimony haplotyping (CPH), we show that the problem is fixed parameter tractable when parameterized by the size of the solution set of haplotypes.


Assuntos
Algoritmos , Haplótipos , Genótipo , Humanos , Filogenia , Polimorfismo de Nucleotídeo Único
5.
J Comput Biol ; 14(8): 1074-87, 2007 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-17985988

RESUMO

Locality is an important and well-studied notion in comparative analysis of biological sequences. Similarly, taking into account affine gap penalties when calculating biological sequence alignments is a well-accepted technique for obtaining better alignments. When dealing with RNA, one has to take into consideration not only sequential features, but also structural features of the inspected molecule. This makes the computation more challenging, and usually prohibits the comparison only to small RNAs. In this paper we introduce two local metrics for comparing RNAs that extend the Smith-Waterman metric and its normalized version used for string comparison. We also present a global RNA alignment algorithm which handles affine gap penalties. Our global algorithm runs in O(m(2)n(1 + lg n/m)) time, while our local algorithms run in O(m(2)n(1 + lg n/m)) and O(n(2)m) time, respectively, where m

Assuntos
Algoritmos , RNA/química , RNA/genética , Alinhamento de Sequência/estatística & dados numéricos , Biologia Computacional
6.
J Comput Biol ; 13(5): 1013-27, 2006 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-16796548

RESUMO

Gene structure prediction is one of the most important problems in computational molecular biology. It involves two steps: the first is finding the evidence (e.g., predicting splice sites) and the second is interpreting the evidence, that is, trying to determine the whole gene structure by assembling its pieces. In this paper, we suggest a combinatorial solution to the second step, which is also referred to as the "Exon Assembly Problem." We use a similarity-based approach that aims to produce a single gene structure based on similarities to a known homologous sequence. We target the sparse case, where filtering has been applied to the data, resulting in a set of O(n) candidate exon blocks. Our algorithm yields an O(n(2) square root of n) solution.


Assuntos
Algoritmos , Éxons/genética , Reconhecimento Automatizado de Padrão , Análise de Sequência de DNA , Software , Biologia Computacional
7.
J Comput Biol ; 12(10): 1289-306, 2005 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-16379535

RESUMO

Permutations on strings representing gene clusters on genomes have been studied earlier by Uno and Yagiura (2000), Heber and Stoye (2001), Bergeron et al. (2002), Eres et al. (2003), and Schmidt and Stoye (2004) and the idea of a maximal permutation pattern was introduced by Eres et al. (2003). In this paper, we present a new tool for representation and detection of gene clusters in multiple genomes, using PQ trees (Booth and Leuker, 1976): this describes the inner structure and the relations between clusters succinctly, aids in filtering meaningful from apparently meaningless clusters, and also gives a natural and meaningful way of visualizing complex clusters. We identify a minimal consensus PQ tree and prove that it is equivalent to a maximal pi pattern (Eres et al., 2003) and each subgraph of the PQ tree corresponds to a nonmaximal permutation pattern. We present a general scheme to handle multiplicity in permutations and also give a linear time algorithm to construct the minimal consensus PQ tree. Further, we demonstrate the results on whole genome datasets. In our analysis of the whole genomes of human and rat, we found about 1.5 million common gene clusters but only about 500 minimal consensus PQ trees, with E. Coli K-12 and B. Subtilis genomes, we found only about 450 minimal consensus PQ trees out of about 15,000 gene clusters, and when comparing eight different Chloroplast genomes, we found only 77 minimal consensus PQ trees out of about 6,700 gene clusters. Further, we show specific instances of functionally related genes in two of the cases.


Assuntos
Genômica/métodos , Modelos Genéticos , Algoritmos , Animais , Bacillus subtilis/genética , Biologia Computacional , DNA de Cloroplastos/genética , Escherichia coli K12/genética , Genoma Bacteriano , Genoma Humano , Genômica/estatística & dados numéricos , Humanos , Família Multigênica , Ratos
8.
J Comput Biol ; 11(6): 1050-60, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15662197

RESUMO

Functionally related genes often appear in each other's neighborhood on the genome; however, the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper, we address the problem of automatically discovering clusters of entities, be they genes or domains: we formalize the abstract problem as a discovery problem called the (pi)pattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E. Coli protein sequences.


Assuntos
Biologia Computacional , Análise de Sequência de DNA/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos , Algoritmos , Interpretação Estatística de Dados , Família Multigênica , Sintenia
9.
Bioinformatics ; 18(5): 679-88, 2002 May.
Artigo em Inglês | MEDLINE | ID: mdl-12050064

RESUMO

MOTIVATION: One of the major features of genomic DNA sequences, distinguishing them from texts in most spoken or artificial languages, is their high repetitiveness. Variation in the repetitiveness of genomic texts reflects the presence and density of different biologically important messages. Thus, deviation from an expected number of repeats in both directions indicates a possible presence of a biological signal. Linguistic complexity corresponds to repetitiveness of a genomic text, and potential regulatory sites may be discovered through construction of typical patterns of complexity distribution. RESULTS: We developed software for fast calculation of linguistic sequence complexity of DNA sequences. Our program utilizes suffix trees to compute the number of subwords present in genomic sequences, thereby allowing calculation of linguistic complexity in time linear in genome size. The measure of linguistic complexity was applied to the complete genome of Haemophilus influenzae. Maps of complexity along the entire genome were obtained using sliding windows of 40, 100, and 2000 nucleotides. This approach provided an efficient way to detect simple sequence repeats in this genome. In addition, local profiles of complexity distribution around the starts of translation were constructed for 21 complete prokaryotic genomes. We hypothesize that complexity profiles correspond to evolutionary relationships between organisms. We found principal differences in profiles of the GC-rich and other (non-GC-rich) genomes. We also found characteristic differences in profiles of AT genomes, which probably reflect individual species variations in translational regulation. AVAILABILITY: The program is available upon request from Alexander Bolshoy or at http://csweb.haifa.ac.il/library/#complex.


Assuntos
Algoritmos , Genoma , Modelos Genéticos , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Sequência de Bases , Metodologias Computacionais , Bases de Dados Genéticas , Árvores de Decisões , Estudos de Viabilidade , Haemophilus influenzae/genética , Dados de Sequência Molecular , Células Procarióticas , Controle de Qualidade , Sensibilidade e Especificidade , Fatores de Tempo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...