Pesquisa | Portal Regional da BVS (teste)

Efficient edit distance with duplications and contractions.

Pinhas, Tamar; Zakov, Shay; Tsur, Dekel; Ziv-Ukelson, Michal.

Algorithms Mol Biol ; 8(1): 27, 2013 Oct 29.

Artigo em Inglês | MEDLINE | ID: mdl-24168705

RESUMO

: We propose three algorithms for string edit distance with duplications and contractions. These include an efficient general algorithm and two improvements which apply under certain constraints on the cost function. The new algorithms solve a more general problem variant and obtain better time complexities with respect to previous algorithms. Our general algorithm is based on min-plus multiplication of square matrices and has time and space complexities of O (|Σ|MP (n)) and O (|Σ|n2), respectively, where |Σ| is the alphabet size, n is the length of the strings, and MP (n) is the time bound for the computation of min-plus matrix multiplication of two n × n matrices (currently, MP(n)=On3log3lognlog2n due to an algorithm by Chan).For integer cost functions, the running time is further improved to O|Σ|n3log2n. In addition, this variant of the algorithm is online, in the sense that the input strings may be given letter by letter, and its time complexity bounds the processing time of the first n given letters. This acceleration is based on our efficient matrix-vector min-plus multiplication algorithm, intended for matrices and vectors for which differences between adjacent entries are from a finite integer interval D. Choosing a constant 1log|D|n<λ<1, the algorithm preprocesses an n × n matrix in On2+λ|D| time and On2+λ|D|λ2log|D|2n space. Then, it may multiply the matrix with any given n-length vector in On2λ2log|D|2n time. Under some discreteness assumptions, this matrix-vector min-plus multiplication algorithm applies to several problems from the domains of context-free grammar parsing and RNA folding and, in particular, implies the asymptotically fastest On3log2n time algorithm for single-strand RNA folding with discrete cost functions.Finally, assuming a different constraint on the cost function, we present another version of the algorithm that exploits the run-length encoding of the strings and runs in O|Σ|nMP(ñ)ñ time and O(|Σ|nñ) space, where ñ is the length of the run-length encoding of the strings.

Regular language constrained sequence alignment revisited.

Kucherov, Gregory; Pinhas, Tamar; Ziv-Ukelson, Michal.

J Comput Biol ; 18(5): 771-81, 2011 May.

Artigo em Inglês | MEDLINE | ID: mdl-21554020

RESUMO

Imposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n²t4) time and O(n²t²) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n²t³) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n²t³)/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve the worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense.

Assuntos

Algoritmos , Biologia Computacional/métodos , Alinhamento de Sequência/métodos , Sequência de Aminoácidos , Bases de Dados Genéticas , Dados de Sequência Molecular , Proteínas/química , Proteínas/genética

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA