Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add more filters










Database
Language
Publication year range
1.
J Comput Biol ; 21(12): 947-63, 2014 Dec.
Article in English | MEDLINE | ID: mdl-25393923

ABSTRACT

Spaced seeds have been recently shown to not only detect more alignments, but also to give a more accurate measure of phylogenetic distances, and to provide a lower misclassification rate when used with Support Vector Machines (SVMs). We confirm by independent experiments these two results, and propose in this article to use a coverage criterion to measure the seed efficiency in both cases in order to design better seed patterns. We show first how this coverage criterion can be directly measured by a full automaton-based approach. We then illustrate how this criterion performs when compared with two other criteria frequently used, namely the single-hit and multiple-hit criteria, through correlation coefficients with the correct classification/the true distance. At the end, for alignment-free distances, we propose an extension by adopting the coverage criterion, show how it performs, and indicate how it can be efficiently computed.


Subject(s)
Pattern Recognition, Automated , Sequence Alignment , Support Vector Machine , Algorithms , Humans , Sequence Analysis, Protein , Sequence Homology, Amino Acid , Software
2.
Bull Math Biol ; 68(8): 2353-64, 2006 Nov.
Article in English | MEDLINE | ID: mdl-16924430

ABSTRACT

Tandem repeats play many important roles in biological research. However, accurate characterization of their properties is limited by the inability to easily detect them. For this reason, much work has been devoted to developing detection algorithms. A widely used algorithm for detecting tandem repeats is the "tandem repeats finder'' (Benson, G., Nucleic Acids Res. 27, 573-580, 1999). In that algorithm, tandem repeats are modeled by percent matches and frequency of indels between adjacent pattern copies, and statistical criteria are used to recognize them. We give a method for computing the exact joint distribution of a pair of statistics that are used in the testing procedures of the "tandem repeats finder'': the total number of matches in matching tuples of length k or longer, and the total number of observations from the beginning of the first such matching tuple to the end of the last one. This allows the computation of the conditional distribution of the latter statistic given the former, a conditional distribution that is used to test for tandem repeats as opposed to non-tandem direct repeats. The setting is a Markovian sequence of a general order. Current approaches to this distributional problem deal only with independent trials and are based on approximations via simulation.


Subject(s)
Algorithms , DNA/genetics , Models, Genetic , Tandem Repeat Sequences , Humans , Markov Chains
SELECTION OF CITATIONS
SEARCH DETAIL
...