Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Comput Biol ; 17(12): 1697-709, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21128856

RESUMO

Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position-specific weight matrices (PSWMs), and co-occurrences of pairs of motifs.


Assuntos
Motivos de Aminoácidos , Reconhecimento Automatizado de Padrão , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência/métodos , Sequência de Aminoácidos , Sequência de Bases , Sequências Repetidas Invertidas , Método de Monte Carlo , Matrizes de Pontuação de Posição Específica
2.
J Comput Biol ; 12(3): 331-54, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15857246

RESUMO

Palindromes are symmetrical words of DNA in the sense that they read exactly the same as their reverse complementary sequences. Representing the occurrences of palindromes in a DNA molecule as points on the unit interval, the scan statistics can be used to identify regions of unusually high concentration of palindromes. These regions have been associated with the replication origins on a few herpesviruses in previous studies. However, the use of scan statistics requires the assumption that the points representing the palindromes are independently and uniformly distributed on the unit interval. In this paper, we provide a mathematical basis for this assumption by showing that in randomly generated DNA sequences, the occurrences of palindromes can be approximated by a Poisson process. An easily computable upper bound on the Wasserstein distance between the palindrome process and the Poisson process is obtained. This bound is then used as a guide to choose an optimal palindrome length in the analysis of a collection of 16 herpesvirus genomes. Regions harboring significant palindrome clusters are identified and compared to known locations of replication origins. This analysis brings out a few interesting extensions of the scan statistics that can help formulate an algorithm for more accurate prediction of replication origins.


Assuntos
Genoma Viral , Herpesviridae/genética , Sequências Repetitivas de Ácido Nucleico , Interpretação Estatística de Dados , Humanos , Análise de Sequência de DNA
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...