Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Comput Biol ; 14(5): 594-614, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17683263

RESUMO

We present a novel approach to managing redundancy in sequence databanks such as GenBank. We store clusters of near-identical sequences as a representative union-sequence and a set of corresponding edits to that sequence. During search, the query is compared to only the union-sequences representing each cluster; cluster members are then only reconstructed and aligned if the union-sequence achieves a sufficiently high score. Using this approach with BLAST results in a 27% reduction in collection size and a corresponding 22% decrease in search time with no significant change in accuracy. We also describe our method for clustering that uses fingerprinting, an approach that has been successfully applied to collections of text and web documents in Information Retrieval. Our clustering approach is ten times faster on the GenBank nonredundant protein database than the fastest existing approach, CD-HIT. We have integrated our approach into FSA-BLAST, our new Open Source version of BLAST (available from http://www.fsa-blast.org/). As a result, FSA-BLAST is twice as fast as NCBI-BLAST with no significant change in accuracy.


Assuntos
Bases de Dados de Proteínas , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Animais , Bases de Dados de Proteínas/tendências , Humanos , Dados de Sequência Molecular , Alinhamento de Sequência/tendências , Análise de Sequência de Proteína/tendências
2.
Appl Environ Microbiol ; 72(2): 1270-8, 2006 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-16461676

RESUMO

Terminal restriction fragment length polymorphism (T-RFLP) analysis has the potential to be useful for comparisons of complex bacterial communities, especially to detect changes in community structure in response to different variables. To do this successfully, systematic variations have to be detected above method-associated noise, by standardizing data sets and assigning confidence estimates to relationships detected. We investigated the use of different standardizing methods in T-RFLP analysis of PCR-amplified 16S rRNA genes to elucidate the similarities between the bacterial communities in 17 soil and sediment samples. We developed a robust method for standardizing data sets that appeared to allow detection of similarities between complex bacterial communities. We term this the variable percentage threshold method. We found that making conclusions about the similarities of complex bacterial communities from T-RFLP profiles generated by a single restriction enzyme (RE) may lead to erroneous conclusions. Instead, the use of multiple REs, each individually, to generate multiple data sets allowed us to determine a confidence estimate for groupings of apparently similar communities and at the same time minimized the effects of RE selection. In conjunction with the variable percentage threshold method, this allowed us to make confident conclusions about the similarities of the complex bacterial communities in the 17 different samples.


Assuntos
Bactérias/genética , Ecossistema , Polimorfismo de Fragmento de Restrição , Sequência de Bases , Intervalos de Confiança , DNA Bacteriano/genética , Sedimentos Geológicos/microbiologia , Filogenia , RNA Bacteriano/genética , RNA Ribossômico 16S/genética , Microbiologia do Solo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...